Wednesday, March 23, 2016

Cassandra: fighting with hints, not fun!!

hi all,

I am not sure if you have experiences when supporting a multi-DC cassandra cluster. The cluster will be running well when network is not a problem. Disaster happens whenever there is a network hiccup. One of the reason behind this will be cassandra hints accumulation. Some or more nodes in the cluster are accumulating the hints for the remote DC, causing the performance degradation. Basically, you will notice your java GC is getting crazy, and having a long pauses. Ouch!!! Some times, you need to disable the hints in order to keep the cluster running. This is a pain. Hints are not visible from any of the nodetool command. So, I coded a simple script to count the accumulated system hints. With this, you can measure the level of hints that been accumulated. If possible, you can disable the hints via the nodetool command, then resume the services when the network is back to normal. In any occasion, you don't wish your cluster node to keep any hints.



#!/bin/bash

HINT_TEMP_FILE="/tmp/hints.txt"
STATUS_TEMP_FILE="/tmp/status"

echo " * Collecting system hints ..."
echo -e "select target_id from system.hints limit 1000000;\n" | /opt/cassandra/bin/cqlsh `hostname -f` > $HINT_TEMP_FILE

sleep 2

echo " * Collecting nodes status ..."
/opt/cassandra/bin/nodetool status > $STATUS_TEMP_FILE


echo " * Calculating system hints ..."

if wc -l $HINT_TEMP_FILE | grep -q "^0"
then
        echo "   Good news!!! No hints are pending!!!"
        exit 0
fi

echo ""
echo " * Total hints: `grep -E "\w{8}-(\w{4}-){3}\w{12}" $HINT_TEMP_FILE | wc -l`"
for i in `grep -E "\w{8}-(\w{4}-){3}\w{12}" $HINT_TEMP_FILE | sort -u`
do
        ip=`grep $i $STATUS_TEMP_FILE| awk '{print $2}'`
        myhost=`host $ip | awk '{print $NF}'`
        echo -e "Host id: $i \
                IP: $ip \
                Hostname: $myhost \
                Count: `grep $i $HINT_TEMP_FILE | wc -l`"
done

No comments: