Thursday, December 1, 2016

Cloudera HDFS kerberized failure: GSSException No Valid credentials

hi hadooper,

This is the problem that you will be facing once you have enabled kerberos on a cloudera server.

Here are the log of the hadoop looks like:

2016-12-01 20:28:21,650 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8022: readAndProcess from client 172.31.6.120 threw exception [javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled)]]
2016-12-01 20:28:23,457 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8022: readAndProcess from client 172.31.0.159 threw exception [javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled)]]
2016-12-01 20:28:23,627 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8022: readAndProcess from client 172.31.0.158 threw exception [javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled)]]

The quick remedy on this will be applying the Java Cryptograhy Extension (JCE) on all the nodes in the cluster.

Here is the step to apply the JCE jar files.

1. Download the tarball which contents the following jar files.

US_export_policy.jar
local_policy.jar

2. Copying them onto each nodes, and overwrite the existing jar files.

/usr/java/jdk1.7.0_67-cloudera/jre/lib/security/US_export_policy.jar
/usr/java/jdk1.7.0_67-cloudera/jre/lib/security/local_policy.jar

3. Make sure, the permission and ownership of the files are retained.

4. Restart hadoop HDFS services.

5. Verify the log file, if there are the same logs appears as before: /var/log/hadoop-hdfs/*

6. Verify your kerberized HDFS is working properly.

[root@ip-172-31-0-157 ~]# kinit cloudera-scm/admin
Password for cloudera-scm/admin@EXAMPLE.COM:
[root@ip-172-31-0-157 ~]# hadoop fs -ls /
Found 1 items
drwxrwxrwt   - hdfs supergroup          0 2016-11-30 21:59 /tmp

7. If you wish to increase the verbosity of the output, you can always export the environment e.g.

export HADOOP_OPTS="-Dsun.security.krb5.debug=true"

8. If you wish to renew the token ticket

kinit -R

9. The krb client configuration /etc/krb5.conf is also important to specific the type e.g. otherwise, you will have this type of errors.

[root@ip-172-31-11-158 197-hdfs-NAMENODE]# cat /etc/krb5.conf
[libdefaults]
default_realm = EXAMPLE.COM
dns_lookup_kdc = false
dns_lookup_realm = false
ticket_lifetime = 86400
renew_lifetime = 604800
forwardable = true
default_tgs_enctypes = rc4-hmac
default_tkt_enctypes = rc4-hmac
permitted_enctypes = rc4-hmac
udp_preference_limit = 1
kdc_timeout = 3000
[realms]
EXAMPLE.COM = {
kdc = ip-172-31-25-156.ap-southeast-1.compute.internal
admin_server = ip-172-31-25-156.ap-southeast-1.compute.internal
}

[root@ip-172-31-25-156 ~]# hadoop fs -ls /
Java config name: null Native config name: /etc/krb5.conf
Loaded from native config
>>>KinitOptions cache name is /tmp/krb5cc_0
>>>DEBUG  client principal is hiuy@EXAMPLE.COM
>>>DEBUG server principal is krbtgt/EXAMPLE.COM@EXAMPLE.COM
>>>DEBUG key type: 18
>>>DEBUG auth time: Fri Dec 02 03:42:32 EST 2016
>>>DEBUG start time: Fri Dec 02 03:42:32 EST 2016
>>>DEBUG end time: Sat Dec 03 03:42:32 EST 2016
>>>DEBUG renew_till time: Fri Dec 02 03:42:32 EST 2016
>>> CCacheInputStream: readFlags()  FORWARDABLE; RENEWABLE; INITIAL;
>>>DEBUG  client principal is hiuy@EXAMPLE.COM
>>>DEBUG server principal is X-CACHECONF:/krb5_ccache_conf_data/fast_avail/krbtgt/EXAMPLE.COM@EXAMPLE.COM
>>>DEBUG key type: 0
>>>DEBUG auth time: Wed Dec 31 19:00:00 EST 1969
>>>DEBUG start time: null
>>>DEBUG end time: Wed Dec 31 19:00:00 EST 1969
>>>DEBUG renew_till time: null
>>> CCacheInputStream: readFlags()
>>> unsupported key type found the default TGT: 18
16/12/02 04:23:11 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
16/12/02 04:23:11 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
16/12/02 04:23:11 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

Friday, August 12, 2016

HDFS HA failover script


hi,

Assuming that you have HDFS HA enabled, then you can have it done. Compiling small HDFS utility to do the failover by handy. Enjoy... :)

#!/bin/bash

SUHDFS="sudo -u hdfs hdfs"

nameservice=`$SUHDFS getconf -confKey dfs.nameservices`

echo "Nameservice: $nameservice"

serviceIds=`$SUHDFS getconf -confKey dfs.ha.namenodes.$nameservice | sed -s 's|,| |g'`
state=""
is_active=""
is_standby=""
for Id in `echo $serviceIds`
do
        namenode_hostname=`$SUHDFS getconf -confKey dfs.namenode.rpc-address.$nameservice.$Id`
        state=`$SUHDFS haadmin -getServiceState $Id`
        if [ "$state" == "active" ]
        then
                is_active="$Id"
        fi
        if [ "$state" == "standby" ]
        then
                is_standby="$Id"
        fi

        echo "Hostname : $namenode_hostname"
        echo "Service ID: $Id ($state)"
done

echo ""
echo -n "Do you want to do a failover from $is_active (active) -> $is_standby (standby)?: [y/n]"
read ans

if [ "$ans" = "y" ]
then
        echo " >> failing over now ...."
        echo " Executing >>hdfs haadmin -failover $is_active $is_standby"
        $SUHDFS haadmin -failover $is_active $is_standby
        if [ "$?" == "0" ]
        then
                echo " >> Done"
        else
                echo " >> Failed"
        fi
else
        echo ">> Exitting ..."
fi

Here is the result.

[root@ip-172-31-17-185 ~]# ./haadmin.sh
Nameservice: nameservice1
Hostname : ip-172-31-17-183.ap-southeast-1.compute.internal:8020
Service ID: namenode22 (standby)
Hostname : ip-172-31-17-184.ap-southeast-1.compute.internal:8020
Service ID: namenode37 (active)

Do you want to do a failover from namenode37 (active) -> namenode22 (standby)?: [y/n]y
 >> failing over now ....
 Executing >>hdfs haadmin -failover namenode37 namenode22
Failover to NameNode at ip-172-31-17-183.ap-southeast-1.compute.internal/172.31.17.183:8022 successful
 >> Done
[root@ip-172-31-17-185 ~]# ./haadmin.sh
Nameservice: nameservice1
Hostname : ip-172-31-17-183.ap-southeast-1.compute.internal:8020
Service ID: namenode22 (active)
Hostname : ip-172-31-17-184.ap-southeast-1.compute.internal:8020
Service ID: namenode37 (standby)

Do you want to do a failover from namenode22 (active) -> namenode37 (standby)?: [y/n]n
>> Exitting ...

Wednesday, August 10, 2016

Installing Python3.5 on CENTOS 6.8

hi all,

I would like to list down all the steps to do a upgrade for python 3.5 on CENTOS6.8.

1. Installing the prerequisite package to build python from source

yum -y groupinstall "Development tools"


yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel 

2. Download the python source.


tar xvfz Python-3.5.*.tgz

cd Python-3.5.*

3. Compiling the python from source


./configure --prefix=/usr/local --enable-shared LDFLAGS="-Wl,-rpath /usr/local/lib"

make && make altinstall

ln -s /usr/local/bin/python3.5 /usr/bin/python3.5

4. Download and install the pip 



python3.5 get-pip.py

ln -s /usr/local/bin/pip /usr/bin/pip  

5. Voila, you have python3.5 ready.

[root@ip-172-31-18-184 ec2-user]# python3.5
Python 3.5.2 (default, Aug 10 2016, 23:27:34)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

6. Pip is ready too

[root@ip-172-31-18-184 ec2-user]# pip --version
pip 8.1.2 from /usr/local/lib/python3.5/site-packages (python 3.5)

Hope it helps.

Tuesday, August 9, 2016

Exploration on Cloudera: managing services without Cloudera Manager

hi,

Cloudera hadoop ecosystem product is a wonderful project that ever been created. Some or many of the engineers thought and curious on the way how Cloudera in controlling hadoop processes, e.g. how to start/stop a NameNode, ResourceManager, etc services without actually login to Cloudera Manager portal. Actually, I like the way how Cloudera engineer the solution. The core of the technology is supervisord. For more detail explanation, you can visit the website at Cloudera documentation website (https://www.cloudera.com/documentation/enterprise/5-4-x/topics/cm_intro_primer.html).

Bottom line of this post is about sharing my findings about the how to control the hadoop processes e.g. start/stop/status through the command lines, but not from the web portal.

I am assuming that you have a up and running cloudera hadoop cluster installed. I have installed a bare minimal hadoop cluster products, which including zookeeper, HDFS, and YARN. I have it done on the aws cloud, 4 x mx4.xlarge instances . That's all.

To start with, I will explore the role of  a node.

[root@ip-172-31-17-183 ec2-user]# jps
2572 NameNode
2669 ResourceManager
3562 Jps

From here, I know this node is serving as for a NameNode and ResourceManager. That's beautiful. To dig further into the running processes. e.g. ResourceManager Pid. What caught my attention is the line that I have highlighted, /var/run/cloudera-scm-agent/process/59-yarn-RESOURCEMANAGER.  

/usr/java/jdk1.7.0_67-cloudera/bin/java -Dproc_resourcemanager -Xmx1000m -Djava.net.preferIPv4Stack=true -Xms1073741824 -Xmx1073741824 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -Dhadoop.event.appender=,EventCatcher -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=hadoop-cmf-yarn-RESOURCEMANAGER-ip-172-31-17-183.ap-southeast-1.compute.internal.log.out -Dyarn.log.file=hadoop-cmf-yarn-RESOURCEMANAGER-ip-172-31-17-183.ap-southeast-1.compute.internal.log.out -Dyarn.home.dir=/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn -Dhadoop.home.dir=/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/lib/native -classpath /var/run/cloudera-scm-agent/process/59-yarn-RESOURCEMANAGER:/var/run/cloudera-scm-agent/process/59-yarn-RESOURCEMANAGER:/var/run/cloudera-scm-agent/process/59-yarn-RESOURCEMANAGER:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/.//*:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-hdfs/./:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/.//*:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/lib/*:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-mapreduce/.//*:/usr/share/cmf/lib/plugins/tt-instrumentation-5.8.1.jar:/usr/share/cmf/lib/plugins/event-publish-5.8.1-shaded.jar:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/.//*:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-yarn/lib/*:/var/run/cloudera-scm-agent/process/59-yarn-RESOURCEMANAGER/rm-config/log4j.properties org.apache.hadoop.yarn.server.resourcemanager.ResourceManager

Also, from the pstree output I know that ResourceManager is not started up with a classic system daemon. There is a python script that actually forking the processes.

`-python-+-python2.6
              |-python2.6---5*[{python2.6}]
              |-java---107*[{java}]
              `-java---213*[{java}]

  |-python /usr/lib64/cmf/agent/build/env/bin/supervisord
  |   |-java -Dproc_resourcemanager -Xmx1000m -Djava.net.preferIPv4Stack=true-
  |   |   |-{java}
  |   |   |-{java}

Yes! that's the supervisord that I am expecting. I am too curious on the /var/run/cloudera-scm-agent/ too. So, I did an exploration and dig in at the directory to find out what could we have.

Surprise..surprise....there is a supervisord.conf configuration within the directory,

[root@ip-172-31-17-183 supervisor]# cat /var/run/cloudera-scm-agent/supervisor/supervisord.conf
[unix_http_server]
file=%(here)s/supervisord.sock
username=6434554715077552454
password=8561047171289009924

[inet_http_server]
port=127.0.0.1:19001
username=6434554715077552454
password=8561047171289009924

[supervisord]
nodaemon=false
logfile=/var/log/cloudera-scm-agent/supervisord.log
identifier=agent-1626-1470791793

[include]
files = /var/run/cloudera-scm-agent/supervisor/include/*.conf

[supervisorctl]
serverurl=http://127.0.0.1:19001/
username=6434554715077552454
password=8561047171289009924

Aha! Now, I know there is a port opening and listening at 19001, it has the credentials that listed in it. Also, it includes all the sub conf files sharing with other daemons. I am satisfying, indeed. Now, I want to know more on the port and the web ui of the supervisord/supervisorctl. 

For sure, I know there is a port that listening at 19001 at localhost. Perfect!

[root@ip-172-31-17-183 supervisor]# netstat -atun | grep 19001
tcp        0      0 127.0.0.1:19001             0.0.0.0:*                   LISTEN
tcp        0      0 127.0.0.1:41185             127.0.0.1:19001             ESTABLISHED
tcp        0      0 127.0.0.1:19001             127.0.0.1:41185             ESTABLISHED

Now, I just need to expose the localhost to external by SSH tunnelling. That's easy, just passing the hightlight line when you are login. 

MacBook-Pro:Downloads yenonn$ ssh -i hadoop.pem -L19001:localhost:19001 ec2-user@ec2-54-179-147-37.ap-southeast-1.compute.amazonaws.com
Last login: Tue Aug  9 21:21:18 2016 from 223.197.191.42
-bash: warning: setlocale: LC_CTYPE: cannot change locale (UTF-8): No such file or directory
[ec2-user@ip-172-31-17-183 ~]$

And I am ready to explore more supervisorctl web ui from my browser. Beautiful! It means I can startup hadoop services from here. pretty neat!

If lets say, you are not a big fan of web ui. We can make use of supervisorctl to achieve the same purpose.

[root@ip-172-31-17-183 supervisor]# /usr/lib64/cmf/agent/build/env/bin/supervisorctl
49-cloudera-mgmt-SERVICEMONITOR  RUNNING    pid 5526, uptime 0:03:36
53-hdfs-NAMENODE                 RUNNING    pid 2572, uptime 0:32:13
59-yarn-RESOURCEMANAGER          RUNNING    pid 2669, uptime 0:32:13
cmflistener                      RUNNING    pid 1801, uptime 0:32:18
flood                            RUNNING    pid 1991, uptime 0:32:16

I can make sure of the supervisord.conf to start/stop the services from here.

[root@ip-172-31-17-183 supervisor]# /usr/lib64/cmf/agent/build/env/bin/supervisorctl -c /var/run/cloudera-scm-agent/supervisor/supervisord.conf status
49-cloudera-mgmt-SERVICEMONITOR  RUNNING    pid 5526, uptime 0:05:01
53-hdfs-NAMENODE                 RUNNING    pid 2572, uptime 0:33:38
59-yarn-RESOURCEMANAGER          RUNNING    pid 2669, uptime 0:33:38
cmflistener                      RUNNING    pid 1801, uptime 0:33:43
flood                            RUNNING    pid 1991, uptime 0:33:41

[root@ip-172-31-17-183 supervisor]# /usr/lib64/cmf/agent/build/env/bin/supervisorctl -c /var/run/cloudera-scm-agent/supervisor/supervisord.conf stop 49-cloudera-mgmt-SERVICEMONITOR
49-cloudera-mgmt-SERVICEMONITOR: stopped

[root@ip-172-31-17-183 supervisor]# /usr/lib64/cmf/agent/build/env/bin/supervisorctl -c /var/run/cloudera-scm-agent/supervisor/supervisord.conf start 49-cloudera-mgmt-SERVICEMONITOR
49-cloudera-mgmt-SERVICEMONITOR: started


Hope you like it. I love Cloudera and will continue to do my exploration!!

Monday, June 13, 2016

Graphite-web in container

hi all,

I am trying to separate each of the components of graphite engines into a small pieces such that it is agile and easy to duplicate when you need to scale out. I have built my first graphite-web container and contributed to github. I will continue to work on the carbon-cache Dockerfile and others components.

Graphite-web: https://github.com/yenonn/docker-graphite

Thursday, June 9, 2016

Setting up a graphite service in CENTOS 7.2

hi all,

I just want list down the steps of graphite installation on CENTOS7.2.


First you have to install the pre-requisite of the packages.

1. yum -y install epel-release python-pip python-devel gcc libev libev-devel pycairo rrdtool-python mod_wsgi git httpd libffi-devel


2. pip install django carbon whisper graphite-web django-tagging pytz


3. pip install Twisted==16.0.0


Here are the packages that I used
Django==1.11.4
whisper==1.0.2
django-tagging==0.4.3
pytz==2017.2
Twisted==16.0.0

4. chmod a+w /opt/graphite/storage


Start copying the conf files and make changes.

5. cp /opt/graphite/conf/carbon.conf.example /opt/graphite/conf/carbon.conf

6. cp /opt/graphite/conf/graphite.wsgi.example /opt/graphite/conf/graphite.wsgi


7. cp /opt/graphite/conf/storage-schemas.conf.example /opt/graphite/conf/storage-schemas.conf

8. cp /opt/graphite/webapp/graphite/local_settings.py.example /opt/graphite/webapp/graphite/local_settings.py 

Once you are done with the copying local_settings.py, please update that TIME_ZONE and SECRET_KEY. please use the tzselect if you don't know how to set your time zone. Secret key can be any strings that you can think of.

Now you are start working on your apache setting.

9. cp /opt/graphite/examples/example-graphite-vhost.conf /etc/httpd/conf.d/graphite-vhost.conf


Here is how the lines in graphite-vhost.conf should look like in Alias /content/ and Directory of /opt/graphite/conf





















Comment out the extra mod_wsgi mod from another conf file.

10. cat /etc/httpd/conf.modules.d/10-wsgi.conf

#LoadModule wsgi_module modules/mod_wsgi.so



On the default /etc/httpd/conf/httpd.conf, you have to update the ServerName. After all you can start your web server service.

11. ServerName  localhost:80


12. setenforce 0; systemctl enable httpd.service; systemctl start httpd.service


Now you can start initialise your django framework. Please complete all the questions asked during the initial setup. Put in your admin credential and email.

13. python /opt/graphite/webapp/graphite/manage.py migrate auth


14. python /opt/graphite/webapp/graphite/manage.py syncdb


13. PYTHONPATH=/opt/graphite/webapp django-admin.py migrate --settings=graphite.settings auth

14. PYTHONPATH=/opt/graphite/webapp django-admin.py syncdb

Please refer more updated information here on the webapp database setup. Now you can start working on the carbon daemon and start it. 

15. python /opt/graphite/bin/carbon-cache.py start


[root@ip-172-31-24-138 ~]# python /opt/graphite/bin/carbon-cache.py start
Starting carbon-cache (instance a)
[root@ip-172-31-24-138 ~]# python /opt/graphite/bin/carbon-cache.py stop

Sending kill signal to pid 25259

[root@ip-172-31-24-138 graphite]# netstat -atun |grep 2003

tcp        0      0 0.0.0.0:2003            0.0.0.0:*               LISTEN

Next you will need to verify your carbon port is listening and your web port is running.

16. netstat -atun | grep 2003
tcp        0      0 0.0.0.0:2003                0.0.0.0:*                   LISTEN

17. netstat -atun | grep 80
tcp        0      0 0.0.0.0:80                0.0.0.0:*                   LISTEN


Wait for a while, after you have started the services. You need to look into your graphite storage for any new whisper file been created. Keep prompt the directory for any new created whisper files.


18. [root@localhost ~]# find /opt/graphite/storage/ -name *.wsp

/opt/graphite/storage/whisper/carbon/agents/localhost_localdomain-a/cache/size.wsp
/opt/graphite/storage/whisper/carbon/agents/localhost_localdomain-a/cache/queries.wsp
/opt/graphite/storage/whisper/carbon/agents/localhost_localdomain-a/cache/bulk_queries.wsp
/opt/graphite/storage/whisper/carbon/agents/localhost_localdomain-a/cache/queues.wsp
/opt/graphite/storage/whisper/carbon/agents/localhost_localdomain-a/cache/overflow.wsp

/opt/graphite/storage/whisper/carbon/agents/localhost_localdomain-a/memUsage.wsp



Once all done, your graphite should be up and running. you can open your browser and go to your graphite web service to verify it is running. At this moment, you should not seeing any data injection from any servers yet. you need to continue working on collectd installation on the client nodes.

Sunday, May 29, 2016

Kernel parameters setting for low latency system


hi all,

I would like to share the setting that I feel it is recommended for most of the low latency system.

# disable the ipv6 setting
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

# reduce swappiness
vm.swappiness = 1

# increase the mmap limit
net.core.wmem_max = 16777216
net.core.rmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 87380 16777216
vm.max_map_count = 131072

Also, the limit setting
* - nofile 100000
* - memlock unlimited
* - nproc 32768
* - as unlimited

Monday, May 16, 2016

Bash in action: detecting opening port

hi all,

If you are running out of any of the unix/linux utility e.g. netstat or nc to detect the opening port, here is one of the simple trick that can help you to achieve your goal. have fun!

timeout 1 bash -c "cat < /dev/null > /dev/tcp/$SERVER/$port"
if [ "$?" -ne 0 ]
then
    echo "Connection to $SERVER on $port is failed."
else
    echo "Connection to $SERVER to $port is succeeded."
fi


Wednesday, April 27, 2016

Disks IO bottleneck with HP Smart Path enabled

hi all,

If you are on HP Proliant products, or more on the smart array controller, P420, P440, something that you need to pay attention on the disks IO bottleneck with the HP Smart Path is enabled. This context of discussion is only restricted to those having SSD disks. I am assuming that you have the SSD disks attached to your smart array controller P420 or P440. When you iowait reading is significant high when showing on iostat. Having 10.69 or 14.70 on iowait is fairly not making sense for having SSD disks.

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda                 0.00     2.00    0.00    2.00     0.00    32.00    16.00     0.00    0.00   0.00   0.00
sdc                 0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb                 0.00 17813.00    0.00  202.00     0.00 144120.00   713.47     2.16   10.69   0.84  17.00
sdd                 0.00 35523.00    0.00  405.00     0.00 283544.00   700.11     6.03   14.70   0.97  39.20


[root@nasilemak root]# hpssacli ctrl all show config detail

Smart Array P440 in Slot 1
   Bus Interface: PCI
   Slot: 1
....

Array: B
      Interface Type: SAS
      Unused Space: 0  MB (0.0%)
      Used Space: 1.6 TB (100.0%)
      Status: OK
      Array Type: Data
      HP SSD Smart Path: enable

Logical Drive: 2
         Size: 447.1 GB
         Fault Tolerance: 0
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 256 KB
         Status: OK
         Caching:  Disabled
         Unique Identifier: 600508B1001CAAF2E9A8EB80A2264AEC
         Disk Name: /dev/sdb
         Mount Points: /data/cass01 447.1 GB Partition Number 2
         OS Status: LOCKED
         Logical Drive Label: 04BAACD3PDNMF0ARH8B30108B5
         Drive Type: Data

         LD Acceleration Method: HP SSD Smart Path



So, after some reading and research, I decided to disable this option and switch the option to Controller cache. Most of my JBOD, RAID0 system will be following this configuration on my hadoop, cassandra and elasticsearch.

Here I have a small script to help me to run the task.

#!/bin/bash

HPSSACLI=`which hpssacli`
$HPSSACLI ctrl all show config detail | sed -e '/^$/d' | tr [:upper:] [:lower:] > /tmp/smartarray.out

if [ `grep -i "HP SSD Smart Path: enable" /tmp/smartarray.out | wc -l ` -ne "0" ]
then
slot=`grep slot /tmp/smartarray.out | head -n1 | awk '{print $NF}'`
grep -i -B6 "HP SSD Smart Path: enable" /tmp/smartarray.out | awk -F: '/array:/ {print $2}' > /tmp/array.out
while read array
                do
logicaldrive=`grep -i -A7 "Array: $array" /tmp/smartarray.out | tail -n1 | awk -F": " '{print $2}'`
                        echo "Array $array has not disabled on the HP smart path"
echo "Action: Disabling Array $array, logical drive $logicaldrive the HP smart path now ..."
$HPSSACLI controller slot=$slot array $array modify ssdsmartpath=disable && \
$HPSSACLI controller slot=$slot logicaldrive $logicaldrive modify caching=enable
if [[ "$?" == "0" ]]
then
echo "Array: $array, logicaldrive: $logicaldrive has been disabled."
fi
                done < /tmp/array.out
rm -fr /tmp/smartarray.out /tmp/array.out
else
echo "All Logical drive has been disabled on HP Smart Path."
fi


And the result will be like following.

[root@nasilemak root]# ./disable_smartpath.sh
Array b has not disabled on the HP smart path
Action: Disabling Array b, logical drive 2 the HP smart path now ...
Array: b, logicaldrive: 2 has been disabled.
Array c has not disabled on the HP smart path
Action: Disabling Array c, logical drive 3 the HP smart path now ...
Array: c, logicaldrive: 3 has been disabled.
Array d has not disabled on the HP smart path
Action: Disabling Array d, logical drive 4 the HP smart path now ...
Array: d, logicaldrive: 4 has been disabled.

After the setting, iostat giving a good result.

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00 39853.00    0.00 3943.00     0.00 350368.00    88.86     1.26    0.32   0.02   8.50
sdc               0.00 15181.00   25.00  213.00  4800.00 118088.00   516.34     0.22    0.88   0.17   4.00
sdb               0.00  7247.00   83.00  126.00 20896.00 58984.00   382.20     0.13    0.61   0.26   5.40
sdd               0.00 24676.00   15.00  356.00  3200.00 194424.00   532.68     0.32    0.83   0.22   8.30


I hope this can overcome your IO bottleneck on your cluster. Now, we have a low latency, high throughput system.

Sunday, April 3, 2016

Supervisord and docker services

hi all,

I am working on docker container recently and find that supervisord is such a wonderful to start/stop and keep the services alive. So, I would like to share my steps on setting it up.

1. Installing supervisord.

wget http://peak.telecommunity.com/dist/ez_setup.py;python ez_setup.py \
&& easy_install supervisor

2. Once you are done. You are ready to define your services that will be governed by supervisord.

Here is the example of /etc/supervisord.conf


[supervisord]
nodaemon=true



[program:sshd]

command=/usr/sbin/sshd -D



[program:redis]

command=/etc/init.d/redis start



[program:rabbitmq-server]

command=/etc/init.d/rabbitmq-server start



[program:sensu-server]

command=/etc/init.d/sensu-server start



[program:uchiwa]

command=/etc/init.d/uchiwa start



[program:sensu-api]

command=/etc/init.d/sensu-api start


3. Then, after all may start your supervisord binary. 

/usr/bin/supervisord

If you are working on Dockerfile, it may sound like this.



#supervisord
RUN wget http://peak.telecommunity.com/dist/ez_setup.py;python ez_setup.py \

&& easy_install supervisor

ADD files/supervisord.conf /etc/supervisord.conf

#start command
CMD ["/usr/bin/supervisord"]