In working with linux boxes in Glued environment, we have had cause to have a couple of boxes with multiple IP addresses, and the default Red Hat network configuration scripts do not seem to work correctly for what we needed in some of these cases. This document lists the problems seen and some hacks to fix them.
Note: These fixes are called hacks for a reason. While I do use them on several production machines I am responsible for, they are hacks and there is no guarantee of any kind that they will work, or even not make things worse, for you. Use at your own risk. While constructive comments are welcome, and I'll even entertain requests for assistance with these hacks, assistance to others comes after my own work obligations and my free time is severely limitted.
The following list enumerates the problems seen:
The situation: A single box has multiple NICs in it, each connected to a different subnet (and therefore with distinct IP addresses). For specificity in the following, let us assume it has two NICs, one NICA having an IP address IPaddrA on the subnetA subnet. The other, NICB, has IP address IPaddrB on the subnetB subnet.
The symptoms: All machines on subnetA can see the box using IPaddrA. Similarly, boxes on subnetB can see the box using IPaddrB. I believe you should also be able to see either address ( IPaddrA or IPaddrB ) if on the other subnet ( subnetB or subnetA, respectively), but won't guarrantee it. The problem is that outside hosts, not on either local subnet (neither subnetA nor subnetB ) can only see the machine using one of the two addresses, and get no response from the other one.
Specifics: Observed with Glued Red Hat Enterprise Edition v3 for x86 based processors. Mainly seen on one box, a Dell PowerEdge 1650 with dual onboard Intel 82544EI NICs.
My analysis: Let us assume that it is IPaddrA which is visible from the outside world, and IPaddrB that is blocked. What appears to be happening is that both NICs function properly with respect to traffic on their own subnet. IPaddrA functions properly even for stuff not on subnetA; when a machine on some other net tries to contact, the subnetA gateway sends the packets to NICA, and the response goes out on NICA back to the gateway, with a source address of IPaddrA and the foreign machines IP address.
When a machine not on subnetB tries to talk to IPaddrB, things start the same. The subnetB gateway sends the packets to NICB, the linux box decides how to respond, and a response is sent out. However, the response goes out on NICA but with the IPaddrB source address. If the machine trying to be reached is on subnetA, the packets seem to get to the destination and no one complains. But if the packets are for another subnet, the router drops the packets because the source address is illegal for subnetA (as it is IPaddrB which is a subnetB address).
Hack to fix it: In the
rc.machine file, use the
/sbin/ip command to set up a somewhat more complicated routing
scenario with a separate routing table for each subnet. For each subnet, the
routing table simply goes out through the NIC if local, or through the NIC
to the appropriate gateway if non-local. Then hook these tables into the
routing rule based on the source IP address.
For example, if the two subnets are 126.96.36.199/23 and 172.80.24/23 on
, respectively, with 188.8.131.52 and
184.108.40.206 as the gateways you
can do something like
#Set up the first subnet's routing table (we'll name it 70)
ip route flush table 70
ip route add table 70 to 220.127.116.11/23 dev eth0
ip route add table 70 to default via 18.104.22.168 dev eth0
#Set up the second subnet's routing table (we'll call it 80)
ip route flush table 80
ip route add table 80 to 22.214.171.124/23 dev eth1
ip route add table 80 to default via 126.96.36.199 dev eth1
#Create the rules to choose what table to use. Choose based on source IP
#We need to give the rules different priorities; for convenience name priority
#after the table
ip rule add from 188.8.131.52/23 table 70 priority 70
ip rule add from 184.108.40.206/23 table 80 priority 80
#Flush the cache to make effective
ip route flush cache
Physics typically puts this into a file called
rc.linux-dual-net-route-hack in the sysconfig tree and calls
this script from
rc.machine. This seems to work fine, as the
primary interface works properly even without the hack, and that is the
interface used to communicate with AFS, KDC, etc. servers, so machine seems to
boot OK. The extra bit of network connectivity gained by the other NIC can
wait until the
rc.machine script gets run.
The situation: A single box has multiple IP addresses on the same subnet (in observed cases, all on the same NIC, not sure if matters). For specificity, assume it has two IP addresses, IPaddrA and IPaddrB on the subnetA subnet.
The symptoms: The machine boots fully and appears to be up and happy. However, network based logins get denied. It is possible to login on the console, but even then some problems. Most notably, attempting to ksu to root yields an error message about wrong target hostname or IP address. Basically, pure Unix stuff works, but a lot of AFS/kerberos related stuff having problems.
Specifics: Observed on a number of Glued Red Hat Enterprise Edition v3 for x86 based processors. Systems observed on include a number of Dell PowerEdge 1650s and 1750's. The systems were all using one of the onboard NICs, which were Broadcom NetXtreme BCM5704 Gigabits and Intel 82544EI Gigabits. In all cases tried, two or three IP addresses were attached to the same NIC. Note: Tried it on a Sun V20 AMD64 box, and the problem was not seen. Not sure why the difference.
My analysis: The presence of multiple IP addresses appears to be causing
the system to create a rather complicated route table, with what appears to
be 2N-1 default routes where is the number of IP
addresses. The basic
route command does not help much, showing
routecommand does not provide information to distinguish much, other than one has metric 1.
/sbin/ip route command, we can see a bit more, e.g.
I am not an expert at reading route entries, but normally expect to see a single default route on a subnet, corresponding to the second line above (without the src specification.
What appears to be happening (based on interpretation of above and sniffing the network traffic), is that traffic originating from the host to hesiod or KDC or AFS servers appears to be using the second (or last) IP address as the source address. As the primary machine name is based on the first IP address, kerberos is not happy, and all the kerberos stuff appears to fail.
Hack to fix it: The solution appears to be to delete all the existing default routes and add a proper default route. This can be done manually, booting the machine into single user mode, starting up networking (e.g.
To fix the problem in a more automated fashion, we run the following in
rc.machine, or more typically, create a script
rc.linux-multi-ips-on-subnet-route-hack in the sysconfig tree
and run that from the
rc.machine file. The script consists of
echo "Fixing default route..."
RES=`route | grep default`
while [ "x$RES" != "x" ]
route del default
RES=`route | grep default`
route add default gw $GATEWAY
echo "default route should be fixed"
Currently, Physics is running this from the
(directly or indirectly), and this appears to be working. We need to look into
it a bit more and ensure nothing requiring kerberos identity is breaking due
to the lateness with which this hack is applied.