I understand that most manufacturers recommend use of auto-negotiation with gigabit links as best practice, but today we got burned. A member of an LACP aggregate failed to auto-negotiate properly, and each end of the link handled this differently, causing the entire aggregate to flap. I cannot find any documentation to see if what happened is expected according to the standards, or if we might have a device that is not functioning in compliance with the standards (in which case we can open a support case with the offending vendor).
Two questions:
- Is there a documented best practice for LACP that recommends avoiding link speed auto-negotiation?
- Does the LACP standard define some specific logic that a device must use when detecting and responding to a speed mismatch between port members? (For example: Is it the fastest link that wins? Is it the first link bundled that wins? Is it some other criteria?)
The Details
4 x1Gbps links bundled in a LACP etherchannel between a NetApp and a Cisco IOS switch. Ports configured for auto speed/duplex on each side.
- A power failure occurred and devices power cycled.
- One of the ports auto-negotiated to 100Mbps (consider this a separate issue, root cause of this is not the primary concern, but this triggered the more concerning issue).
- Cisco Switch detected that this 1 port (100Mbps) was not compatible with the other ports (1000Mbps), and removed it from the bundle. This seems like the ideal behavior (but is it standard?).
- NetApp detected that the other 3 ports (1000Mbps) were not compatible with the one port (100Mbps) and removed all 3 of them fro the bundle. This seems like the opposite of the ideal behavior (but did they violate standard?)
- This lead to flapping back and forth again and again with the netapp repeatedly removing and re-adding the 3 ports, and the switch repeatedly removing and re-adding the 1 port.
- After physically disconnecting and reconnecting the link that negotiated to 100Mbps, it successfully auto-negotiated back to 1000Mbps and the connection was once again stable. We are replacing the cable in-case the cable is faulty and caused the failed auto-negotiation, but my concern is that such a failed auto-negotiation can fail an entire bundle like this.
No comments:
Post a Comment