Tuesday, September 14, 2021

4x10Gb LACP LAG on a Linux box - inconsistent outbound load balancing with payload hashing enabled

Hey all, I have a real head scratcher and need some clue. At this point, I feel like I'm taking-crazy-pills.gif

I'm trying to verify I have as good a load balancing setup as I could given what I have at my disposal. I have all switch-to-host links up in LACP LAG with payload hashing enabled on both the switch and the endpoint hosts, as best as I could verify by RTFMing and lots of google-fu. The configs are relatively straight forward. What I'm focused on is understanding why the Linux host doesn't seem to be spreading the output streams evenly across all 4 of its own links, and how to fix that. AFAICT, whatever is going on is somewhere in my host config (or worse, a kernel bug? a total leap on my part) and not on the switch. Since what I'm observing is traffic originating from the host is at a lower aggregate rate than what I expected, I think my problem is independent from any switch misconfig.

I admit my test bed isn't ultra robust, but here goes. I am using iperf3 to generate traffic hoping to saturate all 4 of the 10Gb links. What I'm seeing is half the bandwidth I'd expect to be able to pump from one of the hosts. It doesn't matter how many parallel iperf3 streams I use, I never can seem to break ~20Gbps total across the 4x10Gb bond. IOW, if I try 4 streams I get about 20Gbps max combined rate, if I try 8 streams I get about the same (with more overhead), and if I go nuts and do 16 streams it's about the same (with even more overhead, to be expected).

What I'm hoping to see is all 4 of my individual links' MTRG graphs to get close to 10Gbps each, and to see the aggregate interfaces reach upward to 40Gbps. What I'm seeing is about half that on each one and I just don't get it.

Here's the basic test scenario:

  • Hosts and switch are air-gapped. There is zero production traffic to contend with my tests.
  • 2 bare metal Linux hosts connected to the switch.
  • Each host has a 4x10Gb LACP LAG from a single Intel x710 NIC, all links up, good light levels, no errors on either end.
  • kernel bonding xmit_hash_policy is set to "layer3+4"
  • kernel version is 5.4.106 (distro is Debian)
  • Sending 8 streams of traffic from client with "iperf3 -c <otherhost> -t 3800 -P 8" and just watching output and traffic stats collect in MRTG over the course of an hour while I do other things.

I'd really appreciate any clue at all on what to try next. I'm pretty lost.

Example output of an iperf3 interval with 8 streams outbound:

- - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 1108.00-1109.00 sec 314 MBytes 2.63 Gbits/sec 0 300 KBytes [ 7] 1108.00-1109.00 sec 313 MBytes 2.62 Gbits/sec 0 331 KBytes [ 9] 1108.00-1109.00 sec 312 MBytes 2.62 Gbits/sec 0 372 KBytes [ 11] 1108.00-1109.00 sec 312 MBytes 2.62 Gbits/sec 0 443 KBytes [ 13] 1108.00-1109.00 sec 312 MBytes 2.62 Gbits/sec 0 592 KBytes [ 15] 1108.00-1109.00 sec 314 MBytes 2.63 Gbits/sec 0 1021 KBytes [ 17] 1108.00-1109.00 sec 314 MBytes 2.63 Gbits/sec 0 728 KBytes [ 19] 1108.00-1109.00 sec 312 MBytes 2.62 Gbits/sec 0 296 KBytes [SUM] 1108.00-1109.00 sec 2.44 GBytes 21.0 Gbits/sec 0 

Here's my config with very limited redaction:

# from /etc/network/interfaces auto bond0 iface bond0 inet manual bond-slaves enp95s0f0 enp95s0f1 enp95s0f2 enp95s0f3 bond-mode 802.3ad bond-miimon 100 bond-downdelay 200 bond-updelay 200 bond-lacp-rate 1 bond-minlinks 1 bond-xmit-hash-policy layer3+4 auto vmbr1 iface vmbr1 inet manual bridge-ports bond0 bridge-stp off bridge-fd 0 bridge-vlan-aware yes bridge-vids 2-4094 auto vmbr1.1000 iface vmbr1.1000 inet static address 192.168.255.1 netmask 24 

Bonding driver information

root@metal1:~# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: IEEE 802.3ad Dynamic link aggregation Transmit Hash Policy: layer3+4 (1) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 200 Down Delay (ms): 200 Peer Notification Delay (ms): 0 802.3ad info LACP rate: fast Min links: 0 Aggregator selection policy (ad_select): stable System priority: 65535 System MAC address: 40:a6:b7:4b:72:18 Active Aggregator Info: Aggregator ID: 2 Number of ports: 4 Actor Key: 15 Partner Key: 7 Partner Mac Address: 04:05:06:07:08:06 Slave Interface: enp95s0f0 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 1 Permanent HW addr: 40:a6:b7:4b:72:18 Slave queue ID: 0 Aggregator ID: 2 Actor Churn State: none Partner Churn State: none Actor Churned Count: 0 Partner Churned Count: 0 details actor lacp pdu: system priority: 65535 system mac address: 40:a6:b7:4b:72:18 port key: 15 port priority: 255 port number: 1 port state: 63 details partner lacp pdu: system priority: 127 system mac address: 04:05:06:07:08:06 oper key: 7 port priority: 127 port number: 3 port state: 63 Slave Interface: enp95s0f1 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 1 Permanent HW addr: 40:a6:b7:4b:72:19 Slave queue ID: 0 Aggregator ID: 2 Actor Churn State: none Partner Churn State: none Actor Churned Count: 0 Partner Churned Count: 0 details actor lacp pdu: system priority: 65535 system mac address: 40:a6:b7:4b:72:18 port key: 15 port priority: 255 port number: 2 port state: 63 details partner lacp pdu: system priority: 127 system mac address: 04:05:06:07:08:06 oper key: 7 port priority: 127 port number: 4 port state: 63 Slave Interface: enp95s0f2 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 1 Permanent HW addr: 40:a6:b7:4b:72:1a Slave queue ID: 0 Aggregator ID: 2 Actor Churn State: none Partner Churn State: none Actor Churned Count: 1 Partner Churned Count: 1 details actor lacp pdu: system priority: 65535 system mac address: 40:a6:b7:4b:72:18 port key: 15 port priority: 255 port number: 3 port state: 63 details partner lacp pdu: system priority: 127 system mac address: 04:05:06:07:08:06 oper key: 7 port priority: 127 port number: 3 port state: 63 Slave Interface: enp95s0f3 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 1 Permanent HW addr: 40:a6:b7:4b:72:1b Slave queue ID: 0 Aggregator ID: 2 Actor Churn State: none Partner Churn State: none Actor Churned Count: 1 Partner Churned Count: 1 details actor lacp pdu: system priority: 65535 system mac address: 40:a6:b7:4b:72:18 port key: 15 port priority: 255 port number: 4 port state: 63 details partner lacp pdu: system priority: 127 system mac address: 04:05:06:07:08:06 oper key: 7 port priority: 127 port number: 4 port state: 63 


No comments:

Post a Comment