Sunday, September 29, 2013

Configuring bonding: RTNETLINK answers: File exists

hi all,

Today, I am trying to configure a active/passive bonding for linux, but keep hitting the same error when I restarted the network services.

[root@node01 network-scripts]# /etc/init.d/network restart
Shutting down interface bond0:                             [  OK  ]
Shutting down interface bond1:                             [  OK  ]
Shutting down interface eth12:                             [  OK  ]
Shutting down loopback interface:                          [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface bond0:                               [  OK  ]
Bringing up interface bond1:                               [  OK  ]
Bringing up interface eth2:  RTNETLINK answers: File exists
                                                           [  OK  ]
Bringing up interface eth12:                               [  OK  ]

I have spent a couples of minutes to trouble but no luck. I took a rest then when back and look closely at my ifcfg-eth* configuration files and I discovered a silly mistake like this. I have stated SLAVE=no at ifcfg-eth2. that's why it keeps on complaining the problem.


[root@node01 network-scripts]# grep bond1 ifcfg-*
ifcfg-bond1:DEVICE=bond1
ifcfg-eth2:MASTER=bond1
ifcfg-eth9:MASTER=bond1
[root@node01 network-scripts]# cat ifcfg-eth2
DEVICE=eth2
BOOTPROTO=none
ONBOOT=yes
MASTER=bond1
SLAVE=no
USERCTL=no
HWADDR=3C:D9:2B:F4:A1:9A
[root@node01 network-scripts]# cat ifcfg-eth9
DEVICE=eth9
BOOTPROTO=none
ONBOOT=yes
MASTER=bond1
SLAVE=yes
USERCTL=no
HWADDR=44:1E:A1:17:F8:76

Hope that you wont be as silly as I am.


After all, you can verify your configuration at

[root@node01 network-scripts]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth2
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 1000
Down Delay (ms): 0

Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 3c:d9:2b:f4:a1:9a
Slave queue ID: 0

Slave Interface: eth9
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 44:1e:a1:17:f8:76
Slave queue ID: 0

Tuesday, September 17, 2013

Script to kill the command with timeout.

If you don't bother to wait the command to timeout. You can have a bash script as a wrapper to clean the PID within your defined timeout values. I came across the same function like this before in my senior codes, just can't recall how is the codes goes. Now, I am totally rewrite it based on my memory. Hope that it can help you when doing bash coding.

function run_with_command () {
CMD=$1
TIMEOUT=$2
COUNTER=0
${CMD} &
CMD_PID=$!
while ps $CMD_PID > /dev/null && [ $COUNTER -lt $TIMEOUT ]; do
sleep 1
COUNTER=$((COUNTER+1))
done


if [ $COUNTER -eq $TIMEOUT ]

then
kill $CMD_PID 2>/dev/null
fi


wait $CMD_PID 2>/dev/null

}

Script to rescan the offline disk

I have a function to detect the offline disks, delete them then, re-scan it. Hope that it can help you from your daily Linux operation.



function fix_stale_disk ()
{
        echo_header
        if grep -q -i offline /sys/block/*/device/state
        then

                #Checking for offline disks
                echo "Linux discovered offline disks."
                grep -l -i offline /sys/block/*/device/state

                echo "Do you want to fix the offline disks? (y/n)"
                read ans
                if [[ "$ans" == "y" || "$ans" == 'Y' ]]
                then

                        for block in `grep -l -i offline /sys/block/*/device/state | awk -F/ '{print $4}'`
                        do
                                status=""
                                echo -n "Deleting /dev/$block ..."
                                echo 1 > /sys/block/$block/device/delete
                                if [[ "$?" -eq "0" ]]
                                then
                                        status="Done"
                                else
                                        status="Failed"
                                fi
                                sleep 5
                                echo " $status"
                        done

                        echo ""
                        #Rescanning the disks.
                        for f in `ls -1 /sys/class/fc_host/*/issue_lip 2>/dev/null`
                        do
                                echo "Rescanning $f ..."
                                echo 1 > $f
                                sleep 5
                        done
                else
                        echo "You have halted the operation."
                        exit 1
                fi
        else
                echo "All disks are good."

        fi
}