hi all,
I was tasked to do some log files massaging and pipe them onto a specific custom log. The whole notion is to detect any of the disk failures that happened on the hadoop cluster. For example, if we are running a 30+ nodes in a hadoop cluster, with each node having 10 local disks attached onto the it. The possibility of having worn/bad disks is really high when the cluster is serving for a business for a period of times. So, instead of proactively scanning the disks on 30+ nodes everyday, we can make use of the existing tool to help us to achieve our goal.
These are the existing tool.
1. rsyslog
2. any sort of monitoring tool, like nagios, or OVO agent/BMC patrol agent.
I want to document on the steps how I manage to harvest the disk failure messages from the standard system log, /var/log/message*.
1. put a conf file like this.
[root@centos65-1 ~]# cat /etc/rsyslog.d/hadoop.conf
:msg, contains, "offline" /var/log/hadoop_disk.log
2. touch the /var/log/hadoop_disk.log
3. restart the rsyslog daemon.
Voila you are done!
Now, we want to test it. Here are the steps.
1. Create a small disk from the VM, carve it, format, and mount it.
[root@centos65-1 ~]# df /mnt/test
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 2063504 35840 1922844 2% /mnt/test
2. Offline the disk state.
[root@centos65-1 ~]# echo "offline" > /sys/block/sda/device/state
[root@centos65-1 ~]#
3. Jump/cd onto the mount point.
[root@centos65-1 ~]# cd /mnt/test
[root@centos65-1 test]# ls
ls: reading directory .: Input/output error
Now, you should be getting the system log files redirected to /var/log/hadoop_disk.log.
[root@centos65-1 ~]# tail -f /var/log/hadoop_disk.log
Jun 25 10:15:06 centos65-1 kernel: sd 0:0:0:0: rejecting I/O to offline device
Jun 25 10:15:12 centos65-1 kernel: sd 0:0:0:0: rejecting I/O to offline device
Jun 25 10:15:12 centos65-1 kernel: sd 0:0:0:0: rejecting I/O to offline device
Jun 25 10:15:12 centos65-1 kernel: sd 0:0:0:0: rejecting I/O to offline device
Good stuffs! This is what we can do achieve. Next step, we can configure our monitoring tool to watch this log file. And eventually generate ticket to alerting the business.
Tuesday, June 24, 2014
Thursday, June 12, 2014
Creating sparse file to show how thin-provisioning is possible on Linux
hi,
Recently I have a good lunch meet up with Wing Loon to investigate on some of the questions that I kept for weeks. We were discussing on how Docker can store the container with extra disk space (by default it is allocating 10Gb for each container), yet my VM has not been allocating such a huge disk space. We were pondering and have no clue, until we bumped into a computer science term called, Sparse file. In layman term, it is something like thin-provisioning allowing a system to allocate a temporarily space, but will not really claiming data block size until you write it onto the disk. This type of file will eventually grow as it goes.
Now, I would like to hand-held you on a few steps to demonstrate/create this type of file.
[root@localhost ~]# dd of=sparse-file bs=1M seek=102400 count=0
0+0 records in
0+0 records out
0 bytes (0 B) copied, 5.3053e-05 s, 0.0 kB/s
[root@localhost ~]# ls -al
total 64
dr-xr-x---. 5 root root 4096 Jun 13 11:10 .
drwxr-xr-x. 18 root root 4096 Jun 13 10:28 ..
-rw-------. 1 root root 1082 Jun 6 20:26 anaconda-ks.cfg
-rw-------. 1 root root 12916 Jun 13 10:06 .bash_history
-rw-r--r--. 1 root root 18 Dec 4 2013 .bash_logout
-rw-r--r--. 1 root root 262 Jun 9 11:12 .bash_profile
-rw-r--r--. 1 root root 176 Dec 4 2013 .bashrc
-rw-r--r--. 1 root root 100 Dec 4 2013 .cshrc
-rw-------. 1 root root 98 Jun 13 10:00 .lesshst
drwxr-xr-x. 3 root root 4096 Jun 6 21:01 .local
drwxr-xr-x. 4 root root 4096 Dec 12 2013 .mozilla
drwxr-----. 3 root root 4096 Jun 6 20:30 .pki
-rw-r--r--. 1 root root 107374182400 Jun 13 11:10 sparse-file
-rw-r--r--. 1 root root 129 Dec 4 2013 .tcshrc
[root@localhost ~]#
[root@localhost ~]# du -s sparse-file
0 sparse-file
If you understand these commands above, you will curious why ls -al sparse-file is showing such a huge numbers, and yet, du -s sparse-file is just a zeroed-file.That's what we called it sparse file.
So, next we will making use of this file, I can create a device mapper file, and put do a prank tricks to my friends. This is how it goes.
[root@localhost ~]# losetup -f
/dev/loop0
[root@localhost ~]# losetup /dev/loop0 /root/sparse-file
[root@localhost ~]# losetup
NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE
/dev/loop0 0 0 0 0 /root/sparse-file
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]# blockdev --getsz /root/sparse-file
209715200
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]# dmsetup create hiu_prank --table "0 209715200 linear /dev/loop0 0"
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]# dmsetup ls
fedora-swap (253:1)
fedora-root (253:0)
hiu_prank (253:2)
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]# fdisk -l /dev/mapper/hiu_prank
Disk /dev/mapper/hiu_prank: 100 GiB, 107374182400 bytes, 209715200 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Now, i have a fake disk that claimed itself having 100Gb. I guess you know how to format it and mount it and write data onto it.
By the way, /var/log/lastlog is also another type of sparse file, which we use it daily.
Enjoy!
(P.S. thank you wingloon.com)
Recently I have a good lunch meet up with Wing Loon to investigate on some of the questions that I kept for weeks. We were discussing on how Docker can store the container with extra disk space (by default it is allocating 10Gb for each container), yet my VM has not been allocating such a huge disk space. We were pondering and have no clue, until we bumped into a computer science term called, Sparse file. In layman term, it is something like thin-provisioning allowing a system to allocate a temporarily space, but will not really claiming data block size until you write it onto the disk. This type of file will eventually grow as it goes.
Now, I would like to hand-held you on a few steps to demonstrate/create this type of file.
[root@localhost ~]# dd of=sparse-file bs=1M seek=102400 count=0
0+0 records in
0+0 records out
0 bytes (0 B) copied, 5.3053e-05 s, 0.0 kB/s
[root@localhost ~]# ls -al
total 64
dr-xr-x---. 5 root root 4096 Jun 13 11:10 .
drwxr-xr-x. 18 root root 4096 Jun 13 10:28 ..
-rw-------. 1 root root 1082 Jun 6 20:26 anaconda-ks.cfg
-rw-------. 1 root root 12916 Jun 13 10:06 .bash_history
-rw-r--r--. 1 root root 18 Dec 4 2013 .bash_logout
-rw-r--r--. 1 root root 262 Jun 9 11:12 .bash_profile
-rw-r--r--. 1 root root 176 Dec 4 2013 .bashrc
-rw-r--r--. 1 root root 100 Dec 4 2013 .cshrc
-rw-------. 1 root root 98 Jun 13 10:00 .lesshst
drwxr-xr-x. 3 root root 4096 Jun 6 21:01 .local
drwxr-xr-x. 4 root root 4096 Dec 12 2013 .mozilla
drwxr-----. 3 root root 4096 Jun 6 20:30 .pki
-rw-r--r--. 1 root root 107374182400 Jun 13 11:10 sparse-file
-rw-r--r--. 1 root root 129 Dec 4 2013 .tcshrc
[root@localhost ~]#
[root@localhost ~]# du -s sparse-file
0 sparse-file
If you understand these commands above, you will curious why ls -al sparse-file is showing such a huge numbers, and yet, du -s sparse-file is just a zeroed-file.That's what we called it sparse file.
So, next we will making use of this file, I can create a device mapper file, and put do a prank tricks to my friends. This is how it goes.
[root@localhost ~]# losetup -f
/dev/loop0
[root@localhost ~]# losetup /dev/loop0 /root/sparse-file
[root@localhost ~]# losetup
NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE
/dev/loop0 0 0 0 0 /root/sparse-file
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]# blockdev --getsz /root/sparse-file
209715200
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]# dmsetup create hiu_prank --table "0 209715200 linear /dev/loop0 0"
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]# dmsetup ls
fedora-swap (253:1)
fedora-root (253:0)
hiu_prank (253:2)
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]# fdisk -l /dev/mapper/hiu_prank
Disk /dev/mapper/hiu_prank: 100 GiB, 107374182400 bytes, 209715200 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Now, i have a fake disk that claimed itself having 100Gb. I guess you know how to format it and mount it and write data onto it.
By the way, /var/log/lastlog is also another type of sparse file, which we use it daily.
Enjoy!
(P.S. thank you wingloon.com)
Subscribe to:
Posts (Atom)