Friday, May 29, 2020

Can I trust LACP to deal with an online but suddenly unconfigured switch properly?

Hey everyone,

I’m trying to avoid an outage on a site with minimal local support and need clarity on one point.

The background to the question is that today I was upgrading my Cumulus linux switches and lost contact with the them through our VPN box (pfSense) which is connected to the switch pair (one link per switch) in a master/slave configuration (we previously had it in LACP, but had major reliability issues with it).

When you do a major upgrade on Cumulus you basically have to re-image the device completely and restore all your settings. When I rebooted into the fresh configuration, it provided link-beats to the connected devices. The pfSense box saw the link-beat, figured everything was OK and used that link as it’s master. The problem is that, as a fresh device, the switch goes nowhere and my remote access vanished.

I managed to recover this with some modest on site help (“please remove the cable in port 1”). Going forward I can fix this by modifying the pfSense LAGG to only use the switch I am not about to do maintenance on.

So, the next switch pair I have to do is connected to single Cisco switch via LACP. The internet runs on a VLAN through this trunk. I don’t want to get this one wrong.

My expectation is that Cisco's LACP will be smart enough to realize that the switch on the other end isn’t configured properly and won't blackhole half or all of my traffic, but “should” is kind of a nebulous deal.

Is my instinct that LACP will do the right thing correct, or should I force the matter by disabling a physical interface in the LACP trunk on the Cisco side to force all the traffic down to whichever switch I am not updating? Should I even do that, or is there some better approach that I am missing?

Thanks,

2inch



No comments:

Post a Comment