Wednesday, March 23, 2016

Network packets dropping?

hi all,

I have some of my linux servers having network packets dropped especially when it is on the receiving parts. It can be noticed at the ifconfig command. With this dragging on the OS, it will degrade the performance. I have tried many way to solve the problem. Some of action that I did was upgrade the firmware version, install of driver, and even bugging the network engineer, claiming the problem was originate from network piece. Basically, I am desperated but, not able to find the answer.

[hiuy@nasilemak~]$ ifconfig
bond0     Link encap:Ethernet  HWaddr 8C:DC:D4:0D:22:50
          inet addr:10.104.192.1  Bcast:10.104.192.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:1798666372532 errors:31860797 dropped:806 overruns:31859889 frame:908
          TX packets:1350275497458 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:366956654314385 (333.7 TiB)  TX bytes:348508200310892 (316.9 TiB)

At last, I have seen some internet posting that Centos 6, couldn't handling the CPU interrupts well. if you can watch "cat the /proc/interrupts". Many of the kernel requests are performed by a single CPU, causing it too overwhelming. The result of this, you have to spread the load evenly to multiple CPUs. Also, it has been a waste if you are not using it.

The answer of this problem will be installing a package called, irqbalance, then start it right away. The result is heavenly revealed. My network problem is gone. Also, when interrupts are spread across all of the CPUs. Magically resolved my problem.

Working with Docker

hi all,

Horay!! I have dockerized my graphite engine. Basically, I crafted a docker image, called centos6.6-graphite-web-base. Then, I extended a couple of paths such that a couple of containers can write onto the same share paths and files. Now, I can spawn out a couple of carbon instances, listening on a specific port numbers with no mercy at all. Its just work! Docker rock!!!!!

I used the supervisor http://supervisord.org/, to launch the processes. It is neat. 

CONTAINER_NAME="GRAPHITE1"
CONTAINER_PORT="80"
if docker ps | grep -q $CONTAINER_NAME
then
        echo "Container is created: $CONTAINER_NAME"
else
        echo -n "Creating container: $CONTAINER_NAME - "
        docker run -d --restart=on-failure:3 \
                -it \
                --memory-swap=-1 \
                --name $CONTAINER_NAME \
                --hostname $CONTAINER_NAME \
                -p $CONTAINER_PORT:$CONTAINER_PORT \
                -v $WHISPER_STORAGE_PATH:/opt/graphite/storage/whisper \
                -v $HTTPD_CONF_PATH:/etc/httpd/conf/httpd.conf:rw \
                hiuy/centos6.6-graphite-web-base:0.0 \
                /usr/local/bin/svscan /etc/my_services
        echo ""
fi

CONTAINER_NAME="CARBON1"
CONTAINER_PORT="56260"
if docker ps | grep -q $CONTAINER_NAME
then
        echo "Container is created: $CONTAINER_NAME"
else
        echo -n "Creating container: $CONTAINER_NAME - "
        docker run -d --restart=on-failure:3 \
                -it \
                --memory-swap=-1 \
                --name $CONTAINER_NAME \
                --hostname $CONTAINER_NAME \
                --env port=$CONTAINER_PORT \
                -p $CONTAINER_PORT:$CONTAINER_PORT \
                -v $WHISPER_CONF_PATH:/opt/graphite/conf/storage-schemas.conf:ro \
                -v $WHISPER_STORAGE_PATH:/opt/graphite/storage/whisper \
                hiuy/centos6.6-carbon-base:0.0 \
                /usr/local/bin/svscan /etc/my_services

fi

Cassandra: fighting with hints, not fun!!

hi all,

I am not sure if you have experiences when supporting a multi-DC cassandra cluster. The cluster will be running well when network is not a problem. Disaster happens whenever there is a network hiccup. One of the reason behind this will be cassandra hints accumulation. Some or more nodes in the cluster are accumulating the hints for the remote DC, causing the performance degradation. Basically, you will notice your java GC is getting crazy, and having a long pauses. Ouch!!! Some times, you need to disable the hints in order to keep the cluster running. This is a pain. Hints are not visible from any of the nodetool command. So, I coded a simple script to count the accumulated system hints. With this, you can measure the level of hints that been accumulated. If possible, you can disable the hints via the nodetool command, then resume the services when the network is back to normal. In any occasion, you don't wish your cluster node to keep any hints.



#!/bin/bash

HINT_TEMP_FILE="/tmp/hints.txt"
STATUS_TEMP_FILE="/tmp/status"

echo " * Collecting system hints ..."
echo -e "select target_id from system.hints limit 1000000;\n" | /opt/cassandra/bin/cqlsh `hostname -f` > $HINT_TEMP_FILE

sleep 2

echo " * Collecting nodes status ..."
/opt/cassandra/bin/nodetool status > $STATUS_TEMP_FILE


echo " * Calculating system hints ..."

if wc -l $HINT_TEMP_FILE | grep -q "^0"
then
        echo "   Good news!!! No hints are pending!!!"
        exit 0
fi

echo ""
echo " * Total hints: `grep -E "\w{8}-(\w{4}-){3}\w{12}" $HINT_TEMP_FILE | wc -l`"
for i in `grep -E "\w{8}-(\w{4}-){3}\w{12}" $HINT_TEMP_FILE | sort -u`
do
        ip=`grep $i $STATUS_TEMP_FILE| awk '{print $2}'`
        myhost=`host $ip | awk '{print $NF}'`
        echo -e "Host id: $i \
                IP: $ip \
                Hostname: $myhost \
                Count: `grep $i $HINT_TEMP_FILE | wc -l`"
done

Bash in action: multithreading

hi all,

I hit a bottleneck on a for loop statement, when I have a function that can be carried out concurrently for some numbers of thread. But, I don't know how to code it in bash. I bumped into this site: http://stackoverflow.com/questions/1455695/forking-multi-threaded-processes-bash, which inspired me to code it. So, at the last I manage to code a multithreading function using bash. Here is how it looks like. I hope it helps some of the devops engineers to solve their problems.

[hiuy@nasilemak ~]$ ./test.sh
Waiting for jobs: 1199 1200 1201
Waiting for jobs: 1199 1200 1201
Waiting for jobs: 1199 1200 1201
Waiting for jobs: 1199 1200 1201
Waiting for jobs: 1199 1200 1201
Waiting for jobs: 1199 1200 1201
Job is Done -- 1199
Waiting for jobs: 1200 1201 1445
Waiting for jobs: 1200 1201 1445
Waiting for jobs: 1200 1201 1445
Waiting for jobs: 1200 1201 1445
Waiting for jobs: 1200 1201 1445
Job is Done -- 1200
Waiting for jobs: 1201 1445 2099
Waiting for jobs: 1201 1445 2099
Waiting for jobs: 1201 1445 2099
Waiting for jobs: 1201 1445 2099
Waiting for jobs: 1201 1445 2099
Job is Done -- 1201
Waiting for jobs: 1445 2099 2118
Job is Done -- 2099
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Waiting for jobs: 1445 2118 2121
Job is Done -- 1445
Waiting for jobs: 2118 2121
Waiting for jobs: 2118 2121
Waiting for jobs: 2118 2121
Waiting for jobs: 2118 2121
Waiting for jobs: 2118 2121
Waiting for jobs: 2118 2121
Waiting for jobs: 2118 2121
Waiting for jobs: 2118 2121
Waiting for jobs: 2118 2121
Waiting for jobs: 2118 2121
Job is Done -- 2118
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Waiting for jobs: 2121
Job is Done -- 2121

Here is the code
#!/bin/bash

TOTAL_THREAD=3
declare -a pids

function wait_pid()
{
        pid=$1
        if [ ! -z $pid ] && kill -0 $pid  &>/dev/null
        then
              #this is a valid pid, then add it
              pids=(${pids[@]} $pid)


                if [ "${#pids[@]}" -lt "$TOTAL_THREAD" ]
                then
                        return 0
                elif [ "${#pids[@]}" -eq "$TOTAL_THREAD" ]
                then
                        while [ "${#pids[@]}" -eq "$TOTAL_THREAD" ]
                        do
                                if [ "${#pids[@]}" -lt $TOTAL_THREAD ]
                                then
                                        break
                                fi
                                echo "Waiting for jobs: ${pids[@]}"
                                local range=$(eval echo {0..$((${#pids[@]}-1))})
                                local i
                                for i in $range; do
                                        if ! kill -0 ${pids[$i]} 2> /dev/null; then
                                                echo "Job is Done -- ${pids[$i]}"
                                                unset pids[$i]
                                        fi
                                done
                                pids=("${pids[@]}") # Expunge nulls created by unset.
                                sleep 1
                        done
                        return 0
                fi
        fi

}

function wait_complete()
{
        while [ "${#pids[@]}" -gt "0" ]
        do
                echo "Waiting for jobs: ${pids[@]}"
                local range=$(eval echo {0..$((${#pids[@]}-1))})
                local i
                for i in $range; do
                        if ! kill -0 ${pids[$i]} 2> /dev/null; then
                                echo "Job is Done -- ${pids[$i]}"
                                unset pids[$i]
                        fi
                done
                pids=("${pids[@]}") # Expunge nulls created by unset.
                sleep 1
        done
}

for s in `echo "5 10 15 25 5 25 60"`
do
        sleep $s &
        wait_pid $!
done


wait_complete

Cassandra sstableupgrade function

hi all,

I guess this is a good common function that will help a lot of engineers who will work on cassandra upgrade. Personally, I coded a sstableupgrade function which I think it is neat and stable. So, I would like to share it to all. I have used it for my cassandra upgrade from version 1.2.8 to 2.0.14.



function sstable_upgrade()
{
        echo_header "SSTABLE UPGRADE"
        MYSNAPSHOT_NAME=""
        if [ ! -z "$1" ]
        then
                MYSNAPSHOT_NAME="$1"
        fi
        SSTABLEUPGRADE_BIN="/opt/cassandra-2.0.14/bin/sstableupgrade"
        sudo sed -i 's|256M|2G|g' $SSTABLEUPGRADE_BIN

        echo_info "Harvesting keyspaces and column family to $CASSANDRA_BACKUP/backup_keyspace.out"
        if sudo find /data/ -name "*-ic*Data.db" | grep -v snapshot* | awk -F"/" '{print $4" "$5}' | sort -u > $CASSANDRA_BACKUP/backup_keyspace.out
        then
                while read line
                do
                        echo_info "SSTABLE upgrade: $SSTABLEUPGRADE_BIN $line $SNAPSHOT_NAME"
                        sudo -tt -u $CASS_USER /bin/bash -c "$SSTABLEUPGRADE_BIN $line $MYSNAPSHOT_NAME"
                        sleep 5
                done < $CASSANDRA_BACKUP/backup_keyspace.out
                

        else
                echo_warn "Fail harvesting keyspaces and column families"
                exit 1
        fi
}

Here will be output

> INFO: SSTABLE upgrade: /opt/cassandra-2.0.14/bin/sstableupgrade system schema_keyspaces
Found 2 sstables that need upgrading.
Upgrading SSTableReader(path='/data/data02/system/schema_keyspaces/system-schema_keyspaces-ic-214-Data.db')
Upgrade of SSTableReader(path='/data/data02/system/schema_keyspaces/system-schema_keyspaces-ic-214-Data.db') complete.
Upgrading SSTableReader(path='/data/data02/system/schema_keyspaces/system-schema_keyspaces-ic-213-Data.db')
Upgrade of SSTableReader(path='/data/data02/system/schema_keyspaces/system-schema_keyspaces-ic-213-Data.db') complete.
  > INFO: SSTABLE upgrade: /opt/cassandra-2.0.14/bin/sstableupgrade system_traces events
Found 2 sstables that need upgrading.
Upgrading SSTableReader(path='/data/data03/system_traces/events/system_traces-events-ic-1-Data.db')
Upgrade of SSTableReader(path='/data/data03/system_traces/events/system_traces-events-ic-1-Data.db') complete.
Upgrading SSTableReader(path='/data/data02/system_traces/events/system_traces-events-ic-2-Data.db')
Upgrade of SSTableReader(path='/data/data02/system_traces/events/system_traces-events-ic-2-Data.db') complete.
  > INFO: SSTABLE upgrade: /opt/cassandra-2.0.14/bin/sstableupgrade system_traces sessions
Found 2 sstables that need upgrading.
Upgrading SSTableReader(path='/data/data03/system_traces/sessions/system_traces-sessions-ic-1-Data.db')
Upgrade of SSTableReader(path='/data/data03/system_traces/sessions/system_traces-sessions-ic-1-Data.db') complete.
Upgrading SSTableReader(path='/data/data03/system_traces/sessions/system_traces-sessions-ic-2-Data.db')
Upgrade of SSTableReader(path='/data/data03/system_traces/sessions/system_traces-sessions-ic-2-Data.db') complete.
  > INFO: Done