darkone (\/) (;,,,;) (\/) 11610 Posts user info edit post |
I'm configuring a new machine in my lab. I'm not an expert and I'm learning as I go. I've tried to set up NIC bonding (mode 6). I've followed the instructions at: http://www.linuxtopia.org/online_books/rhel6/rhel_6_deployment/rhel_6_deployment_sec-Using_Channel_Bonding.html
ifconfig shows the eth0, eth1, and bond0 interfaces. The master/slave statuses are correct. They all show the same IP like they should.
I can access machines inside my LAN, but nothing outside. pings to 192.168.xxx.xxx address work fine. pings and nslookups to anything else don't work. I through it was a DNS problem, but specifying the DNS servers IP addresses in the ifcfg-bond0 file didn't help. A few things I've read have suggested that the routing table might be the problem, but I don't know what the output of the route command is supposed to look like nor what to change if it's a problem.
I'm also getting random kernel crashes that require a reboot in order to get the network connections working again. Also, since manually editing the /etc/sysconfig/network-scripts/ifcfg-# and creating then editing the /etc/modprobe.d/bonding.conf files that the NetworkManager hasn't really had a handle on what the network interfaces are trying to do.
I'm currently reinstalling CentOS since reverting all the files I edited back to their original states didn't revert everything back to normal.
I'm out of my realm of experience with this. I'm a scientist, not a sysadmin. Can anyone give some advice on how I should proceed to get bonding working? I've found plenty of guides for setting up bonding online, but almost nothing about how to proceed is things aren't working correctly. 11/22/2011 2:51:27 PM |
KillaB All American 1652 Posts user info edit post |
When you try it again, if you experience issues, post your ifcfg-ethX (and bondX) contents here so we can assist.
http://www.how2centos.com/centos-6-channel-bonding/ should provide a pretty easy/simple method for configuring your network bond. 11/22/2011 3:14:18 PM |
darkone (\/) (;,,,;) (\/) 11610 Posts user info edit post |
^ My ifcfg files looked just like the ones in your link save for the following line in the bond0 file since setting mode in the /etc/modprobe.d/bonding.conf file is no longer supported:
BONDING_OPTS="mode=6 miimon=100" 11/22/2011 5:16:24 PM |
KillaB All American 1652 Posts user info edit post |
/etc/modprobe.d/bonding.conf is supported, /etc/modprobe.conf is what is deprecated. I would remove the BONDING_OPTS in your ifcfg file and place
Quote : | "alias bond0 bonding options bond0 mode=6 miimon=100" |
in the /etc/modprobe.d/bonding.conf file instead and restart your network service.11/22/2011 6:09:27 PM |
raiden All American 10505 Posts user info edit post |
yeah if you have issues post the files here. 11/22/2011 7:10:12 PM |
darkone (\/) (;,,,;) (\/) 11610 Posts user info edit post |
^^ From: http://www.linuxtopia.org/online_books/rhel6/rhel_6_deployment/rhel_6_deployment_s2-networkscripts-interfaces-chan.html
Quote : | "Parameters for the bonding kernel module must be specified as a space-separated list in the BONDING_OPTS="" directive in the ifcfg-bond interface file. Do not specify options for the bonding device in /etc/modprobe.d/.conf, or in the deprecated /etc/modprobe.conf file." |
When I initially went to set up bonding, I specified mode in the bonding.conf file and it didn't work in CentOS6. From what I've read, the only thing needed in the bonding.conf file is the line:
alias bond0 bonding
On a related note, I'm starting to think that my motherboard's (supermicro X8DTE-F) Intel 82574L controller may have an issue with the the e1000e driver bundled in CentOS6 based on this tidbit from Supermicro: http://www.supermicro.nl/support/faqs/faq.cfm?faq=1217011/22/2011 9:36:48 PM |
darkone (\/) (;,,,;) (\/) 11610 Posts user info edit post |
Looks like at least some of my network problems are a legitimate bug/poor upstream driver support: https://bugzilla.redhat.com/show_bug.cgi?id=632650 11/22/2011 10:13:42 PM |
BIGcementpon Status Name 11318 Posts user info edit post |
If you can ping in your subnet, but not out of it, do you have a default gateway set correctly? 11/23/2011 8:31:40 AM |
darkone (\/) (;,,,;) (\/) 11610 Posts user info edit post |
^ You're going to have to specify what's correct.
My Configuration:
#more /etc/sysconfig/network-scripts/bond0 DEVICE=bond0 IPADDR=192.168.100.15 NETWORK=192.168.100.1 NETMASK=255.255.255.0 DNS1=152.1.1.248 DNS2=152.1.1.206 USERCTL=no BOOTPROTO=none ONBOOT=yes BONDING_OPTS="mode=6 miimon=100"
#more /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 USERCTL=no ONBOOT=yes MASTER=bond0 SLAVE=yes BOOTPROTO=none
#more /etc/sysconfig/network-scripts/ifcfg-eth1 DEVICE=eth0 USERCTL=no ONBOOT=yes MASTER=bond0 SLAVE=yes BOOTPROTO=none
#more /etc/modprobe.d/bonding.config alias bond0 bonding
#ifconfig # ifconfig bond0 Link encap:Ethernet HWaddr 00:25:90:60:5C:F2 inet addr:192.168.100.15 Bcast:192.168.100.255 Mask:255.255.255.0 inet6 addr: fe80::225:90ff:fe60:5cf2/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:3366 errors:0 dropped:0 overruns:0 frame:0 TX packets:5358 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:327316 (319.6 KiB) TX bytes:344108 (336.0 KiB)
eth0 Link encap:Ethernet HWaddr 00:25:90:60:5C:F2 inet addr:192.168.100.15 Bcast:192.168.100.255 Mask:255.255.255.0 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:1765 errors:0 dropped:0 overruns:0 frame:0 TX packets:2806 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:173569 (169.5 KiB) TX bytes:173785 (169.7 KiB) Interrupt:16 Memory:face0000-fad00000
eth1 Link encap:Ethernet HWaddr 00:25:90:60:5C:F3 inet addr:192.168.100.15 Bcast:192.168.100.255 Mask:255.255.255.0 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:1601 errors:0 dropped:0 overruns:0 frame:0 TX packets:2552 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:153747 (150.1 KiB) TX bytes:170323 (166.3 KiB) Interrupt:17 Memory:fade0000-fae00000
lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:1662 errors:0 dropped:0 overruns:0 frame:0 TX packets:1662 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:156701 (153.0 KiB) TX bytes:156701 (153.0 KiB)
# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
Bonding Mode: adaptive load balancing Primary Slave: None Currently Active Slave: eth0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0
Slave Interface: eth0 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:25:90:60:5c:f2
Slave Interface: eth1 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:25:90:60:5c:f3
# ethtool bond0 Settings for bond0: Link detected: yes
# ethtool -i bond0 driver: bonding version: 3.5.0 firmware-version: 2 bus-info:
# ethtool -i eth0 driver: e1000e version: 1.6.2-NAPI firmware-version: 1.9-0 bus-info: 0000:03:00.0
# ethtool eth0 Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on MDI-X: off Supports Wake-on: pumbg Wake-on: g Current message level: 0x00000001 (1) Link detected: yes
# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.100.0 * 255.255.255.0 U 0 0 0 bond0 192.168.100.0 * 255.255.255.0 U 1 0 0 eth1 192.168.100.0 * 255.255.255.0 U 1 0 0 eth0 link-local * 255.255.0.0 U 1004 0 0 bond0 default 192.168.100.1 0.0.0.0 UG 0 0 0 eth0
If you need anymore relevant information, just ask.11/23/2011 1:13:31 PM |
mellocj All American 1872 Posts user info edit post |
i am not sure this is the cause of your problem, but this is incorrect:
Quote : | "#more /etc/sysconfig/network-scripts/bond0 DEVICE=bond0 IPADDR=192.168.100.15 NETWORK=192.168.100.1 NETMASK=255.255.255.0" |
you should have:
Quote : | "NETWORK=1921.68.100.0 GATEWAY=192.168.100.1" |
what kind of switch are you using? i haven't tried to do bonding like you're talking about.. i'm assuming your switch doesn't support LACP?
[Edited on November 23, 2011 at 2:31 PM. Reason : tqq]11/23/2011 2:30:51 PM |
darkone (\/) (;,,,;) (\/) 11610 Posts user info edit post |
^ I changed the line in the ifcfg-bond0 files to read: NETWORK= 192.168.100.0
It didn't help.
I'm using balanced-alb mode because it's doesn't require anything of the switches. That said, I just have some consumer level 8port Dlink gigabit switches between me and the gateway (Cisco PIX something or other).
Does this routing table look correct?
# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.100.0 * 255.255.255.0 U 0 0 0 bond0 192.168.100.0 * 255.255.255.0 U 1 0 0 eth1 192.168.100.0 * 255.255.255.0 U 1 0 0 eth0 link-local * 255.255.0.0 U 1004 0 0 bond0 default 192.168.100.1 0.0.0.0 UG 0 0 0 eth0
[Edited on November 23, 2011 at 3:21 PM. Reason : didn't help]11/23/2011 3:13:14 PM |
darkone (\/) (;,,,;) (\/) 11610 Posts user info edit post |
Fuck yea!
# ip route change default via 192.168.100.1 dev bond0
worked by changing the default route from eth0 to bond0.
So the question becomes: why do I have to do that and how do I fix it?
I'm certain that NetworkManager is playing a role. I disabled it and the system never configured a default route to the gateway. I re-enabled it and added the line NW_CONTROLLED = no and the HWADD to the ifcfg-eth0 and ifcfg-eth1 files, but then things just didn't work at all.
Will uninstalling the NetworkManager act any differently that disabling the service from running at start up?11/23/2011 3:32:24 PM |
raiden All American 10505 Posts user info edit post |
First off this,
default 192.168.100.1 0.0.0.0 UG 0 0 0 eth0
looks incorrect. Your default route should be your bond0 interface.
Also, why are you using this
Bonding Mode: adaptive load balancing
as your bonding mode?
Are you trying to send traffic out of both interfaces and load balance that traffic? Or are you creating a bonded interface for failover purposes? (if the latter, I suggest you use mode "active-backup")11/25/2011 7:14:37 PM |
darkone (\/) (;,,,;) (\/) 11610 Posts user info edit post |
Load balancing is my intention.
If you look at my last post, you'll see that I figured out that the default route was being set incorrectly to the wrong device. Manually changing the routing table fixes my problem. I haven't been able to figure out why it's wrong in the first place and what configuration I need to change to get things to behave correctly.
I suspect that the NetworkManager program is the source of the problem, but I don't know enough about how it works and what it's doing "under the hood" to know how to address the issue.
I'm going to try disabling the NetworkManager on Monday and work towards manually specifying all my configuration options. Any advice towards that end would be appreciated. 11/26/2011 4:04:53 PM |
llama All American 841 Posts user info edit post |
Are you using NetworkManager and do you even want to use it? Do you have X installed on this system? If not, NetworkManager most likely isn't even installed. If NM is installed and you *don't* want to use it to control this interfaces, then you need to make sure the network service is started.
Try this:
1. Add the following option to ifcfg-bon0, ifcfg-eth0, and ifcfg-eth1
NM_CONTROLLED=no
notice this is different than what you posted earlier.
2. Restart the network service with:
service network restart
The ifconfig scripts should be smart enough to know the gateway should be on bond0, but if after restarting/starting the network service the gateway device is still wrong, then add the following to /etc/sysconfig/network:
GATEWAYDEV=bond0
then restart the network service again.
Let me know how that goes. 11/27/2011 1:47:11 PM |
raiden All American 10505 Posts user info edit post |
Also look into using 'ifenslave' 11/27/2011 5:33:57 PM |
llama All American 841 Posts user info edit post |
The network service will handle all of this for him with ease. No need to go manually running any of the scripts. 11/27/2011 5:41:11 PM |
raiden All American 10505 Posts user info edit post |
Network service wont handle the creation of bond interfaces. 11/28/2011 7:18:42 AM |
darkone (\/) (;,,,;) (\/) 11610 Posts user info edit post |
Disabling NetworkManager solved all my problems. Bonding is working (as best I can tell) and my routing table configured correctly. 11/28/2011 10:53:11 AM |
llama All American 841 Posts user info edit post |
^^ Correct, the bond devices themselves will be enumerated when the bonding module is loaded. The network service will take are of enslaving other interfaces to them. If you've seen an instance where this isn't happening correctly let me know and we can look into it. 11/28/2011 4:14:19 PM |
raiden All American 10505 Posts user info edit post |
normally I don't use Network Manager. I do my interface & bonding manually via ifconfig & ifenslave (respectively). I seem to recall there is something in Network Manager that doesn't work right with bonding, but maybe I'm not remembering correctly, its just a vague thought.
glad to hear your stuff is working now. 11/28/2011 8:47:28 PM |
darkone (\/) (;,,,;) (\/) 11610 Posts user info edit post |
^ It's my understanding that NetworkManager ignores bonded interfaces. However, since it still sees the slaved interfaces it tries to enable them separate of the bonding. It also usurps configurations set outside of the NetworkManager; hence my routing table problem. Supposedly NetworkManager will properly handle bonding in near-future releases. 11/28/2011 9:22:16 PM |