Wednesday, July 11, 2018

MAC flap took down the whole network

Hello folks, this is one of those posts that excites people... lol jk.

Today, we had someone turn on a switch port which took down almost access for everyone on the network to everything for about 10 min until we resolved it

https://imgur.com/a/wtJLyIA

The drawing below explains the topology. We have 2 nexus cores that are peer links that attach to many switches via VPC, in this example it is our building distribution switch. Our builiding dist switch connects to a building access switch. VLAN 5 is like a black-hole vlan, it's on all our sw-ports by default and meant to not give access to anything until we put a real vlan in there. So we have this new ciena router that is supposed to hook up our main site here to DR site far away. so far it's only connected here, but not DR site yet. The ciena has 2 connections going to each nexus core configured as switchport access vlan 5 on our cores. The ciena is also turned on. So right when my co-worker went to activate a random sw-port at a randome access switch for one of the buildings, he just turned it on....next thing you know tickets start pouring in about loss of connectivity and we get 1 log in the distribution switch saying:

building distribution switch#

\%SW_MATM-4-MACFLAP_NOTIF: Host 085f.51y6.5626 in vlan 5 is flapping between port Po1 and port Po2

\%SW_MATM-4-MACFLAP_NOTIF: Host 085f.51y6.5626 in vlan 5 is flapping between port Po1 and port Po2

So hell breaks loose until we shut off the port. my question is how could this been avoided? Do I need to configure broadcast storm thresholds? I would think spanning-tree would do something about this, but apparently not... I would also expect to see this log more than just once as I outputted above right? I've seen loops like this before in the past many times with MAC flaps, but to take down the network to everyone??? I've never seen it be bad to this magnitude!!



No comments:

Post a Comment