Hiu and Linux

Monday, June 13, 2016

Graphite-web in container

hi all,

I am trying to separate each of the components of graphite engines into a small pieces such that it is agile and easy to duplicate when you need to scale out. I have built my first graphite-web container and contributed to github. I will continue to work on the carbon-cache Dockerfile and others components.

Graphite-web: https://github.com/yenonn/docker-graphite

Thursday, June 9, 2016

Setting up a graphite service in CENTOS 7.2

hi all,

I just want list down the steps of graphite installation on CENTOS7.2.

First you have to install the pre-requisite of the packages.

1. yum -y install epel-release python-pip python-devel gcc libev libev-devel pycairo rrdtool-python mod_wsgi git httpd libffi-devel

2. pip install django carbon whisper graphite-web django-tagging pytz

3. pip install Twisted==16.0.0

Here are the packages that I used
Django==1.11.4
whisper==1.0.2
django-tagging==0.4.3
pytz==2017.2
Twisted==16.0.0

4. chmod a+w /opt/graphite/storage

Start copying the conf files and make changes.

5. cp /opt/graphite/conf/carbon.conf.example /opt/graphite/conf/carbon.conf

6. cp /opt/graphite/conf/graphite.wsgi.example /opt/graphite/conf/graphite.wsgi

7. cp /opt/graphite/conf/storage-schemas.conf.example /opt/graphite/conf/storage-schemas.conf

8. cp /opt/graphite/webapp/graphite/local_settings.py.example /opt/graphite/webapp/graphite/local_settings.py
Once you are done with the copying local_settings.py, please update that TIME_ZONE and SECRET_KEY. please use the tzselect if you don't know how to set your time zone. Secret key can be any strings that you can think of.

Now you are start working on your apache setting.

9. cp /opt/graphite/examples/example-graphite-vhost.conf /etc/httpd/conf.d/graphite-vhost.conf

Here is how the lines in graphite-vhost.conf should look like in Alias /content/ and Directory of /opt/graphite/conf

Comment out the extra mod_wsgi mod from another conf file.

10. cat /etc/httpd/conf.modules.d/10-wsgi.conf
#LoadModule wsgi_module modules/mod_wsgi.so

On the default /etc/httpd/conf/httpd.conf, you have to update the ServerName. After all you can start your web server service.

11. ServerName localhost:80

12. setenforce 0; systemctl enable httpd.service; systemctl start httpd.service

Now you can start initialise your django framework. Please complete all the questions asked during the initial setup. Put in your admin credential and email.

~~13. python /opt/graphite/webapp/graphite/manage.py migrate auth~~

~~14. python /opt/graphite/webapp/graphite/manage.py syncdb~~

13. PYTHONPATH=/opt/graphite/webapp django-admin.py migrate --settings=graphite.settings auth

14. PYTHONPATH=/opt/graphite/webapp django-admin.py syncdb

Please refer more updated information here on the webapp database setup. Now you can start working on the carbon daemon and start it.

15. python /opt/graphite/bin/carbon-cache.py start

[root@ip-172-31-24-138 ~]# python /opt/graphite/bin/carbon-cache.py start
Starting carbon-cache (instance a)
[root@ip-172-31-24-138 ~]# python /opt/graphite/bin/carbon-cache.py stop

Sending kill signal to pid 25259

[root@ip-172-31-24-138 graphite]# netstat -atun |grep 2003

tcp 0 0 0.0.0.0:2003 0.0.0.0:* LISTEN

Next you will need to verify your carbon port is listening and your web port is running.

16. netstat -atun | grep 2003
tcp 0 0 0.0.0.0:2003 0.0.0.0:* LISTEN

17. netstat -atun | grep 80
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN

Wait for a while, after you have started the services. You need to look into your graphite storage for any new whisper file been created. Keep prompt the directory for any new created whisper files.

18. [root@localhost ~]# find /opt/graphite/storage/ -name *.wsp
/opt/graphite/storage/whisper/carbon/agents/localhost_localdomain-a/cache/size.wsp
/opt/graphite/storage/whisper/carbon/agents/localhost_localdomain-a/cache/queries.wsp
/opt/graphite/storage/whisper/carbon/agents/localhost_localdomain-a/cache/bulk_queries.wsp
/opt/graphite/storage/whisper/carbon/agents/localhost_localdomain-a/cache/queues.wsp
/opt/graphite/storage/whisper/carbon/agents/localhost_localdomain-a/cache/overflow.wsp

/opt/graphite/storage/whisper/carbon/agents/localhost_localdomain-a/memUsage.wsp

Once all done, your graphite should be up and running. you can open your browser and go to your graphite web service to verify it is running. At this moment, you should not seeing any data injection from any servers yet. you need to continue working on collectd installation on the client nodes.

Sunday, May 29, 2016

Kernel parameters setting for low latency system

hi all,

I would like to share the setting that I feel it is recommended for most of the low latency system.

# disable the ipv6 setting
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

# reduce swappiness
vm.swappiness = 1

# increase the mmap limit
net.core.wmem_max = 16777216
net.core.rmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 87380 16777216
vm.max_map_count = 131072

Also, the limit setting
* - nofile 100000
* - memlock unlimited
* - nproc 32768
* - as unlimited

Monday, May 16, 2016

Bash in action: detecting opening port

hi all,

If you are running out of any of the unix/linux utility e.g. netstat or nc to detect the opening port, here is one of the simple trick that can help you to achieve your goal. have fun!

timeout 1 bash -c "cat < /dev/null > /dev/tcp/$SERVER/$port"
if [ "$?" -ne 0 ]
then
echo "Connection to $SERVER on $port is failed."
else
echo "Connection to $SERVER to $port is succeeded."
fi

Wednesday, April 27, 2016

Disks IO bottleneck with HP Smart Path enabled

hi all,

If you are on HP Proliant products, or more on the smart array controller, P420, P440, something that you need to pay attention on the disks IO bottleneck with the HP Smart Path is enabled. This context of discussion is only restricted to those having SSD disks. I am assuming that you have the SSD disks attached to your smart array controller P420 or P440. When you iowait reading is significant high when showing on iostat. Having 10.69 or 14.70 on iowait is fairly not making sense for having SSD disks.

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 2.00 0.00 2.00 0.00 32.00 16.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 17813.00 0.00 202.00 0.00 144120.00 713.47 2.16 10.69 0.84 17.00
sdd 0.00 35523.00 0.00 405.00 0.00 283544.00 700.11 6.03 14.70 0.97 39.20

[root@nasilemak root]# hpssacli ctrl all show config detail

Smart Array P440 in Slot 1
Bus Interface: PCI
Slot: 1
....

Array: B
Interface Type: SAS
Unused Space: 0 MB (0.0%)
Used Space: 1.6 TB (100.0%)
Status: OK
Array Type: Data
HP SSD Smart Path: enable

Logical Drive: 2
Size: 447.1 GB
Fault Tolerance: 0
Heads: 255
Sectors Per Track: 32
Cylinders: 65535
Strip Size: 256 KB
Full Stripe Size: 256 KB
Status: OK
Caching: Disabled
Unique Identifier: 600508B1001CAAF2E9A8EB80A2264AEC
Disk Name: /dev/sdb
Mount Points: /data/cass01 447.1 GB Partition Number 2
OS Status: LOCKED
Logical Drive Label: 04BAACD3PDNMF0ARH8B30108B5
Drive Type: Data

LD Acceleration Method: HP SSD Smart Path

So, after some reading and research, I decided to disable this option and switch the option to Controller cache. Most of my JBOD, RAID0 system will be following this configuration on my hadoop, cassandra and elasticsearch.

Here I have a small script to help me to run the task.

#!/bin/bash

HPSSACLI=`which hpssacli`
$HPSSACLI ctrl all show config detail | sed -e '/^$/d' | tr [:upper:] [:lower:] > /tmp/smartarray.out

if [ `grep -i "HP SSD Smart Path: enable" /tmp/smartarray.out | wc -l ` -ne "0" ]
then
slot=`grep slot /tmp/smartarray.out | head -n1 | awk '{print $NF}'`
grep -i -B6 "HP SSD Smart Path: enable" /tmp/smartarray.out | awk -F: '/array:/ {print $2}' > /tmp/array.out
while read array
do
logicaldrive=`grep -i -A7 "Array: $array" /tmp/smartarray.out | tail -n1 | awk -F": " '{print $2}'`
echo "Array $array has not disabled on the HP smart path"
echo "Action: Disabling Array $array, logical drive $logicaldrive the HP smart path now ..."
$HPSSACLI controller slot=$slot array $array modify ssdsmartpath=disable && \
$HPSSACLI controller slot=$slot logicaldrive $logicaldrive modify caching=enable
if [[ "$?" == "0" ]]
then
echo "Array: $array, logicaldrive: $logicaldrive has been disabled."
fi
done < /tmp/array.out
rm -fr /tmp/smartarray.out /tmp/array.out
else
echo "All Logical drive has been disabled on HP Smart Path."
fi

And the result will be like following.

[root@nasilemak root]# ./disable_smartpath.sh
Array b has not disabled on the HP smart path
Action: Disabling Array b, logical drive 2 the HP smart path now ...
Array: b, logicaldrive: 2 has been disabled.
Array c has not disabled on the HP smart path
Action: Disabling Array c, logical drive 3 the HP smart path now ...
Array: c, logicaldrive: 3 has been disabled.
Array d has not disabled on the HP smart path
Action: Disabling Array d, logical drive 4 the HP smart path now ...
Array: d, logicaldrive: 4 has been disabled.

After the setting, iostat giving a good result.

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 39853.00 0.00 3943.00 0.00 350368.00 88.86 1.26 0.32 0.02 8.50
sdc 0.00 15181.00 25.00 213.00 4800.00 118088.00 516.34 0.22 0.88 0.17 4.00
sdb 0.00 7247.00 83.00 126.00 20896.00 58984.00 382.20 0.13 0.61 0.26 5.40
sdd 0.00 24676.00 15.00 356.00 3200.00 194424.00 532.68 0.32 0.83 0.22 8.30

I hope this can overcome your IO bottleneck on your cluster. Now, we have a low latency, high throughput system.

Sunday, April 3, 2016

Supervisord and docker services

hi all,

I am working on docker container recently and find that supervisord is such a wonderful to start/stop and keep the services alive. So, I would like to share my steps on setting it up.

1. Installing supervisord.

wget http://peak.telecommunity.com/dist/ez_setup.py;python ez_setup.py \
&& easy_install supervisor

2. Once you are done. You are ready to define your services that will be governed by supervisord.

Here is the example of /etc/supervisord.conf

[supervisord]

nodaemon=true

[program:sshd]

command=/usr/sbin/sshd -D

[program:redis]

command=/etc/init.d/redis start

[program:rabbitmq-server]

command=/etc/init.d/rabbitmq-server start

[program:sensu-server]

command=/etc/init.d/sensu-server start

[program:uchiwa]

command=/etc/init.d/uchiwa start

[program:sensu-api]

command=/etc/init.d/sensu-api start

3. Then, after all may start your supervisord binary.

/usr/bin/supervisord

If you are working on Dockerfile, it may sound like this.


	#supervisord RUN wget http://peak.telecommunity.com/dist/ez_setup.py;python ez_setup.py \
	&& easy_install supervisor
	ADD files/supervisord.conf /etc/supervisord.conf #start command CMD ["/usr/bin/supervisord"]

Wednesday, March 23, 2016

Network packets dropping?

hi all,

I have some of my linux servers having network packets dropped especially when it is on the receiving parts. It can be noticed at the ifconfig command. With this dragging on the OS, it will degrade the performance. I have tried many way to solve the problem. Some of action that I did was upgrade the firmware version, install of driver, and even bugging the network engineer, claiming the problem was originate from network piece. Basically, I am desperated but, not able to find the answer.

[hiuy@nasilemak~]$ ifconfig
bond0 Link encap:Ethernet HWaddr 8C:DC:D4:0D:22:50
inet addr:10.104.192.1 Bcast:10.104.192.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:1798666372532 errors:31860797 dropped:806 overruns:31859889 frame:908
TX packets:1350275497458 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:366956654314385 (333.7 TiB) TX bytes:348508200310892 (316.9 TiB)

At last, I have seen some internet posting that Centos 6, couldn't handling the CPU interrupts well. if you can watch "cat the /proc/interrupts". Many of the kernel requests are performed by a single CPU, causing it too overwhelming. The result of this, you have to spread the load evenly to multiple CPUs. Also, it has been a waste if you are not using it.

The answer of this problem will be installing a package called, irqbalance, then start it right away. The result is heavenly revealed. My network problem is gone. Also, when interrupts are spread across all of the CPUs. Magically resolved my problem.

Working with Docker

hi all,

Horay!! I have dockerized my graphite engine. Basically, I crafted a docker image, called centos6.6-graphite-web-base. Then, I extended a couple of paths such that a couple of containers can write onto the same share paths and files. Now, I can spawn out a couple of carbon instances, listening on a specific port numbers with no mercy at all. Its just work! Docker rock!!!!!

I used the supervisor http://supervisord.org/, to launch the processes. It is neat.

CONTAINER_NAME="GRAPHITE1"
CONTAINER_PORT="80"
if docker ps | grep -q $CONTAINER_NAME
then
echo "Container is created: $CONTAINER_NAME"
else
echo -n "Creating container: $CONTAINER_NAME - "
docker run -d --restart=on-failure:3 \
-it \
--memory-swap=-1 \
--name $CONTAINER_NAME \
--hostname $CONTAINER_NAME \
-p $CONTAINER_PORT:$CONTAINER_PORT \
-v $WHISPER_STORAGE_PATH:/opt/graphite/storage/whisper \
-v $HTTPD_CONF_PATH:/etc/httpd/conf/httpd.conf:rw \
hiuy/centos6.6-graphite-web-base:0.0 \
/usr/local/bin/svscan /etc/my_services
echo ""
fi

CONTAINER_NAME="CARBON1"
CONTAINER_PORT="56260"
if docker ps | grep -q $CONTAINER_NAME
then
echo "Container is created: $CONTAINER_NAME"
else
echo -n "Creating container: $CONTAINER_NAME - "
docker run -d --restart=on-failure:3 \
-it \
--memory-swap=-1 \
--name $CONTAINER_NAME \
--hostname $CONTAINER_NAME \
--env port=$CONTAINER_PORT \
-p $CONTAINER_PORT:$CONTAINER_PORT \
-v $WHISPER_CONF_PATH:/opt/graphite/conf/storage-schemas.conf:ro \
-v $WHISPER_STORAGE_PATH:/opt/graphite/storage/whisper \
hiuy/centos6.6-carbon-base:0.0 \
/usr/local/bin/svscan /etc/my_services

fi

Cassandra: fighting with hints, not fun!!

hi all,

I am not sure if you have experiences when supporting a multi-DC cassandra cluster. The cluster will be running well when network is not a problem. Disaster happens whenever there is a network hiccup. One of the reason behind this will be cassandra hints accumulation. Some or more nodes in the cluster are accumulating the hints for the remote DC, causing the performance degradation. Basically, you will notice your java GC is getting crazy, and having a long pauses. Ouch!!! Some times, you need to disable the hints in order to keep the cluster running. This is a pain. Hints are not visible from any of the nodetool command. So, I coded a simple script to count the accumulated system hints. With this, you can measure the level of hints that been accumulated. If possible, you can disable the hints via the nodetool command, then resume the services when the network is back to normal. In any occasion, you don't wish your cluster node to keep any hints.

#!/bin/bash

HINT_TEMP_FILE="/tmp/hints.txt"
STATUS_TEMP_FILE="/tmp/status"

echo " * Collecting system hints ..."
echo -e "select target_id from system.hints limit 1000000;\n" | /opt/cassandra/bin/cqlsh `hostname -f` > $HINT_TEMP_FILE

sleep 2

echo " * Collecting nodes status ..."
/opt/cassandra/bin/nodetool status > $STATUS_TEMP_FILE

echo " * Calculating system hints ..."

if wc -l $HINT_TEMP_FILE | grep -q "^0"
then
echo " Good news!!! No hints are pending!!!"
exit 0
fi

echo ""
echo " * Total hints: `grep -E "\w{8}-(\w{4}-){3}\w{12}" $HINT_TEMP_FILE | wc -l`"
for i in `grep -E "\w{8}-(\w{4}-){3}\w{12}" $HINT_TEMP_FILE | sort -u`
do
ip=`grep $i $STATUS_TEMP_FILE| awk '{print $2}'`
myhost=`host $ip | awk '{print $NF}'`
echo -e "Host id: $i \
IP: $ip \
Hostname: $myhost \
Count: `grep $i $HINT_TEMP_FILE | wc -l`"
done

Bash in action: multithreading

hi all,

I hit a bottleneck on a for loop statement, when I have a function that can be carried out concurrently for some numbers of thread. But, I don't know how to code it in bash. I bumped into this site: http://stackoverflow.com/questions/1455695/forking-multi-threaded-processes-bash, which inspired me to code it. So, at the last I manage to code a multithreading function using bash. Here is how it looks like. I hope it helps some of the devops engineers to solve their problems.

[hiuy@nasilemak ~]$ ./test.sh
Waiting for jobs: 1199 1200 1201
Waiting for jobs: 1199 1200 1201
Waiting for jobs: 1199 1200 1201
Waiting for jobs: 1199 1200 1201
Waiting for jobs: 1199 1200 1201
Waiting for jobs: 1199 1200 1201
Job is Done -- 1199
Waiting for jobs: 1200 1201 1445
Waiting for jobs: 1200 1201 1445
Waiting for jobs: 1200 1201 1445
Waiting for jobs: 1200 1201 1445
Waiting for jobs: 1200 1201 1445
Job is Done -- 1200
Waiting for jobs: 1201 1445 2099
Waiting for jobs: 1201 1445 2099
Waiting for jobs: 1201 1445 2099
Waiting for jobs: 1201 1445 2099
Waiting for jobs: 1201 1445 2099
Job is Done -- 1201
Waiting for jobs: 1445 2099 2118
Job is Done -- 2099
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Job is Done -- 1445
Waiting for jobs: 2118 2121
Waiting for jobs: 2118 2121
Waiting for jobs: 2118 2121
Waiting for jobs: 2118 2121
Waiting for jobs: 2118 2121
Waiting for jobs: 2118 2121
Waiting for jobs: 2118 2121
Waiting for jobs: 2118 2121
Waiting for jobs: 2118 2121
Waiting for jobs: 2118 2121
Job is Done -- 2118
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Job is Done -- 2121

Here is the code
#!/bin/bash

TOTAL_THREAD=3
declare -a pids

function wait_pid()
{
pid=$1
if [ ! -z $pid ] && kill -0 $pid &>/dev/null
then
#this is a valid pid, then add it
pids=(${pids[@]} $pid)

if [ "${#pids[@]}" -lt "$TOTAL_THREAD" ]
then
return 0
elif [ "${#pids[@]}" -eq "$TOTAL_THREAD" ]
then
while [ "${#pids[@]}" -eq "$TOTAL_THREAD" ]
do
if [ "${#pids[@]}" -lt $TOTAL_THREAD ]
then
break
fi
echo "Waiting for jobs: ${pids[@]}"
local range=$(eval echo {0..$((${#pids[@]}-1))})
local i
for i in $range; do
if ! kill -0 ${pids[$i]} 2> /dev/null; then
echo "Job is Done -- ${pids[$i]}"
unset pids[$i]
fi
done
pids=("${pids[@]}") # Expunge nulls created by unset.
sleep 1
done
return 0
fi
fi

}

function wait_complete()
{
while [ "${#pids[@]}" -gt "0" ]
do
echo "Waiting for jobs: ${pids[@]}"
local range=$(eval echo {0..$((${#pids[@]}-1))})
local i
for i in $range; do
if ! kill -0 ${pids[$i]} 2> /dev/null; then
echo "Job is Done -- ${pids[$i]}"
unset pids[$i]
fi
done
pids=("${pids[@]}") # Expunge nulls created by unset.
sleep 1
done
}

for s in `echo "5 10 15 25 5 25 60"`
do
sleep $s &
wait_pid $!
done

wait_complete