Hiu and Linux: September 2013

Sunday, September 29, 2013

Configuring bonding: RTNETLINK answers: File exists

hi all,

Today, I am trying to configure a active/passive bonding for linux, but keep hitting the same error when I restarted the network services.

[root@node01 network-scripts]# /etc/init.d/network restart
Shutting down interface bond0: [ OK ]
Shutting down interface bond1: [ OK ]
Shutting down interface eth12: [ OK ]
Shutting down loopback interface: [ OK ]
Bringing up loopback interface: [ OK ]
Bringing up interface bond0: [ OK ]
Bringing up interface bond1: [ OK ]
Bringing up interface eth2: RTNETLINK answers: File exists
[ OK ]
Bringing up interface eth12: [ OK ]

I have spent a couples of minutes to trouble but no luck. I took a rest then when back and look closely at my ifcfg-eth* configuration files and I discovered a silly mistake like this. I have stated SLAVE=no at ifcfg-eth2. that's why it keeps on complaining the problem.

[root@node01 network-scripts]# grep bond1 ifcfg-*

ifcfg-bond1:DEVICE=bond1

ifcfg-eth2:MASTER=bond1

ifcfg-eth9:MASTER=bond1

[root@node01 network-scripts]# cat ifcfg-eth2

DEVICE=eth2

BOOTPROTO=none

ONBOOT=yes

MASTER=bond1

SLAVE=no

USERCTL=no

HWADDR=3C:D9:2B:F4:A1:9A

[root@node01 network-scripts]# cat ifcfg-eth9

DEVICE=eth9

BOOTPROTO=none

ONBOOT=yes

MASTER=bond1

SLAVE=yes

USERCTL=no

HWADDR=44:1E:A1:17:F8:76

Hope that you wont be as silly as I am.

After all, you can verify your configuration at

[root@node01 network-scripts]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth2
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 1000
Down Delay (ms): 0

Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 3c:d9:2b:f4:a1:9a
Slave queue ID: 0

Slave Interface: eth9
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 44:1e:a1:17:f8:76
Slave queue ID: 0

Tuesday, September 17, 2013

Script to kill the command with timeout.

If you don't bother to wait the command to timeout. You can have a bash script as a wrapper to clean the PID within your defined timeout values. I came across the same function like this before in my senior codes, just can't recall how is the codes goes. Now, I am totally rewrite it based on my memory. Hope that it can help you when doing bash coding.

function run_with_command () {

CMD=$1

TIMEOUT=$2

COUNTER=0

${CMD} &

CMD_PID=$!

while ps $CMD_PID > /dev/null && [ $COUNTER -lt $TIMEOUT ]; do

sleep 1

COUNTER=$((COUNTER+1))

done

if [ $COUNTER -eq $TIMEOUT ]

then

kill $CMD_PID 2>/dev/null

wait $CMD_PID 2>/dev/null

}

Script to rescan the offline disk

I have a function to detect the offline disks, delete them then, re-scan it. Hope that it can help you from your daily Linux operation.

function fix_stale_disk ()
{
echo_header
if grep -q -i offline /sys/block/*/device/state
then

#Checking for offline disks
echo "Linux discovered offline disks."
grep -l -i offline /sys/block/*/device/state

echo "Do you want to fix the offline disks? (y/n)"
read ans
if [[ "$ans" == "y" || "$ans" == 'Y' ]]
then

for block in `grep -l -i offline /sys/block/*/device/state | awk -F/ '{print $4}'`
do
status=""
echo -n "Deleting /dev/$block ..."
echo 1 > /sys/block/$block/device/delete
if [[ "$?" -eq "0" ]]
then
status="Done"
else
status="Failed"
fi
sleep 5
echo " $status"
done

echo ""
#Rescanning the disks.
for f in `ls -1 /sys/class/fc_host/*/issue_lip 2>/dev/null`
do
echo "Rescanning $f ..."
echo 1 > $f
sleep 5
done
else
echo "You have halted the operation."
exit 1
fi
else
echo "All disks are good."

fi
}