Tuesday, June 24, 2014

Managing bad disks in Hadoop cluster

hi all,

I was tasked to do some log files massaging and pipe them onto a specific custom log. The whole notion is to detect any of the disk failures that happened on the hadoop cluster. For example, if we are running a 30+ nodes in a hadoop cluster, with each node having 10 local disks attached onto the it. The possibility of having worn/bad disks is really high when the cluster is serving for a business for a period of times. So, instead of proactively scanning the disks on 30+ nodes everyday, we can make use of the existing tool to help us to achieve our goal.

These are the existing tool.

1. rsyslog
2. any sort of monitoring tool, like nagios, or OVO agent/BMC patrol agent.


I want to document on the steps how I manage to harvest the disk failure messages from the standard system log, /var/log/message*.

1. put a conf file like this.

[root@centos65-1 ~]# cat /etc/rsyslog.d/hadoop.conf
:msg, contains, "offline"    /var/log/hadoop_disk.log


2. touch the /var/log/hadoop_disk.log

3. restart the rsyslog daemon.


Voila you are done!


Now, we want to test it. Here are the steps.

1. Create a small disk from the VM, carve it, format, and mount it.

 [root@centos65-1 ~]# df /mnt/test
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/sda1        2063504 35840   1922844   2% /mnt/test


2. Offline the disk state.

[root@centos65-1 ~]# echo "offline" > /sys/block/sda/device/state
[root@centos65-1 ~]#


3. Jump/cd onto the mount point.

[root@centos65-1 ~]# cd /mnt/test
[root@centos65-1 test]# ls
ls: reading directory .: Input/output error


Now, you should be getting the system log files redirected to /var/log/hadoop_disk.log.

[root@centos65-1 ~]# tail -f /var/log/hadoop_disk.log
Jun 25 10:15:06 centos65-1 kernel: sd 0:0:0:0: rejecting I/O to offline device
Jun 25 10:15:12 centos65-1 kernel: sd 0:0:0:0: rejecting I/O to offline device
Jun 25 10:15:12 centos65-1 kernel: sd 0:0:0:0: rejecting I/O to offline device
Jun 25 10:15:12 centos65-1 kernel: sd 0:0:0:0: rejecting I/O to offline device


Good stuffs! This is what we can do achieve. Next step, we can configure our monitoring tool to watch this log file. And eventually generate ticket to alerting the business.


Thursday, June 12, 2014

Creating sparse file to show how thin-provisioning is possible on Linux

hi,

Recently I have a good lunch meet up with Wing Loon to investigate on some of the questions that I kept for weeks. We were discussing on how Docker can store the container with extra disk space (by default it is allocating 10Gb for each container), yet my VM has not been allocating such a huge disk space. We were pondering and have no clue, until we bumped into a computer science term called, Sparse file. In layman term, it is something like thin-provisioning allowing a system to allocate a temporarily space, but will not really claiming data block size until you write it onto the disk. This type of file will eventually grow as it goes.

Now, I would like to hand-held you on a few steps to demonstrate/create this type of file.


[root@localhost ~]# dd of=sparse-file bs=1M seek=102400 count=0
0+0 records in
0+0 records out
0 bytes (0 B) copied, 5.3053e-05 s, 0.0 kB/s
[root@localhost ~]# ls -al
total 64
dr-xr-x---.  5 root root         4096 Jun 13 11:10 .
drwxr-xr-x. 18 root root         4096 Jun 13 10:28 ..
-rw-------.  1 root root         1082 Jun  6 20:26 anaconda-ks.cfg
-rw-------.  1 root root        12916 Jun 13 10:06 .bash_history
-rw-r--r--.  1 root root           18 Dec  4  2013 .bash_logout
-rw-r--r--.  1 root root          262 Jun  9 11:12 .bash_profile
-rw-r--r--.  1 root root          176 Dec  4  2013 .bashrc
-rw-r--r--.  1 root root          100 Dec  4  2013 .cshrc
-rw-------.  1 root root           98 Jun 13 10:00 .lesshst
drwxr-xr-x.  3 root root         4096 Jun  6 21:01 .local
drwxr-xr-x.  4 root root         4096 Dec 12  2013 .mozilla
drwxr-----.  3 root root         4096 Jun  6 20:30 .pki
-rw-r--r--.  1 root root 107374182400 Jun 13 11:10 sparse-file
-rw-r--r--.  1 root root          129 Dec  4  2013 .tcshrc
[root@localhost ~]#

[root@localhost ~]# du -s sparse-file
0    sparse-file


If you understand these commands above, you will curious why ls -al sparse-file is showing such a huge numbers, and yet, du -s sparse-file is just a zeroed-file.That's what we called it sparse file.


So, next we will making use of this file, I can create a device mapper file, and put do a prank tricks to my friends. This is how it goes.


[root@localhost ~]# losetup -f
/dev/loop0
[root@localhost ~]# losetup /dev/loop0 /root/sparse-file
[root@localhost ~]# losetup
NAME       SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE
/dev/loop0         0      0         0  0 /root/sparse-file
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]# blockdev --getsz /root/sparse-file
209715200
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]# dmsetup create hiu_prank --table "0 209715200 linear /dev/loop0 0"
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]# dmsetup ls
fedora-swap    (253:1)
fedora-root    (253:0)
hiu_prank    (253:2)
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]# fdisk -l /dev/mapper/hiu_prank

Disk /dev/mapper/hiu_prank: 100 GiB, 107374182400 bytes, 209715200 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes



Now, i have a fake disk that claimed itself having 100Gb. I guess you know how to format it and mount it and write data onto it.

By the way, /var/log/lastlog is also another type of sparse file, which we use it daily.

Enjoy!
 (P.S. thank you wingloon.com)