Wednesday, April 27, 2016

Disks IO bottleneck with HP Smart Path enabled

hi all,

If you are on HP Proliant products, or more on the smart array controller, P420, P440, something that you need to pay attention on the disks IO bottleneck with the HP Smart Path is enabled. This context of discussion is only restricted to those having SSD disks. I am assuming that you have the SSD disks attached to your smart array controller P420 or P440. When you iowait reading is significant high when showing on iostat. Having 10.69 or 14.70 on iowait is fairly not making sense for having SSD disks.

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda                 0.00     2.00    0.00    2.00     0.00    32.00    16.00     0.00    0.00   0.00   0.00
sdc                 0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb                 0.00 17813.00    0.00  202.00     0.00 144120.00   713.47     2.16   10.69   0.84  17.00
sdd                 0.00 35523.00    0.00  405.00     0.00 283544.00   700.11     6.03   14.70   0.97  39.20


[root@nasilemak root]# hpssacli ctrl all show config detail

Smart Array P440 in Slot 1
   Bus Interface: PCI
   Slot: 1
....

Array: B
      Interface Type: SAS
      Unused Space: 0  MB (0.0%)
      Used Space: 1.6 TB (100.0%)
      Status: OK
      Array Type: Data
      HP SSD Smart Path: enable

Logical Drive: 2
         Size: 447.1 GB
         Fault Tolerance: 0
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 256 KB
         Status: OK
         Caching:  Disabled
         Unique Identifier: 600508B1001CAAF2E9A8EB80A2264AEC
         Disk Name: /dev/sdb
         Mount Points: /data/cass01 447.1 GB Partition Number 2
         OS Status: LOCKED
         Logical Drive Label: 04BAACD3PDNMF0ARH8B30108B5
         Drive Type: Data

         LD Acceleration Method: HP SSD Smart Path



So, after some reading and research, I decided to disable this option and switch the option to Controller cache. Most of my JBOD, RAID0 system will be following this configuration on my hadoop, cassandra and elasticsearch.

Here I have a small script to help me to run the task.

#!/bin/bash

HPSSACLI=`which hpssacli`
$HPSSACLI ctrl all show config detail | sed -e '/^$/d' | tr [:upper:] [:lower:] > /tmp/smartarray.out

if [ `grep -i "HP SSD Smart Path: enable" /tmp/smartarray.out | wc -l ` -ne "0" ]
then
slot=`grep slot /tmp/smartarray.out | head -n1 | awk '{print $NF}'`
grep -i -B6 "HP SSD Smart Path: enable" /tmp/smartarray.out | awk -F: '/array:/ {print $2}' > /tmp/array.out
while read array
                do
logicaldrive=`grep -i -A7 "Array: $array" /tmp/smartarray.out | tail -n1 | awk -F": " '{print $2}'`
                        echo "Array $array has not disabled on the HP smart path"
echo "Action: Disabling Array $array, logical drive $logicaldrive the HP smart path now ..."
$HPSSACLI controller slot=$slot array $array modify ssdsmartpath=disable && \
$HPSSACLI controller slot=$slot logicaldrive $logicaldrive modify caching=enable
if [[ "$?" == "0" ]]
then
echo "Array: $array, logicaldrive: $logicaldrive has been disabled."
fi
                done < /tmp/array.out
rm -fr /tmp/smartarray.out /tmp/array.out
else
echo "All Logical drive has been disabled on HP Smart Path."
fi


And the result will be like following.

[root@nasilemak root]# ./disable_smartpath.sh
Array b has not disabled on the HP smart path
Action: Disabling Array b, logical drive 2 the HP smart path now ...
Array: b, logicaldrive: 2 has been disabled.
Array c has not disabled on the HP smart path
Action: Disabling Array c, logical drive 3 the HP smart path now ...
Array: c, logicaldrive: 3 has been disabled.
Array d has not disabled on the HP smart path
Action: Disabling Array d, logical drive 4 the HP smart path now ...
Array: d, logicaldrive: 4 has been disabled.

After the setting, iostat giving a good result.

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00 39853.00    0.00 3943.00     0.00 350368.00    88.86     1.26    0.32   0.02   8.50
sdc               0.00 15181.00   25.00  213.00  4800.00 118088.00   516.34     0.22    0.88   0.17   4.00
sdb               0.00  7247.00   83.00  126.00 20896.00 58984.00   382.20     0.13    0.61   0.26   5.40
sdd               0.00 24676.00   15.00  356.00  3200.00 194424.00   532.68     0.32    0.83   0.22   8.30


I hope this can overcome your IO bottleneck on your cluster. Now, we have a low latency, high throughput system.

No comments: