Thursday, March 28, 2019

Mystery L2 Issue with Juniper Switches

I'm dealing with an issue that has left me perplexed.

A little background, we recently configured two separate stacks of 3 ex4800s as redundant access layers for our office. Each user has two ports at their desk, one going to either stack. These two stacks are trunking via LACP links to two ex4600s which sit in a virtual chassis. These sit in front of our office router, and a separate Cisco 2921 that does our VoIP network.

The configurations on the two access stacks are uniform and essentially identical. The same VoIP VLAN is configured on every port, and phones work without issue. That is, until we activate both stacks... If both access stacks are trunking to the aggregation VC there will be what appears to be a loop specifically confined to our VoIP VLAN. Phones will periodically lose connection for a time, until all the phones in the office are eventually down. During maintenance windows everything will appear fine, but as users roll in during the day the outage will come rolling in. Eventually the stack will become overwhelmed and require a reboot, but the phones are always the first victims.

The two stacks both see the aggregation as their root bridge, have RSTP enabled and bpdu-block-on-edge enabled. When we span the traffic during the outages we see TONS of DHCP traffic, but I'm not sure if it's a red herring. I've moved DHCP off of the Cisco Router and onto the AD instance that we use for all of our other subnets due to a lack of familiarity with DHCP on Cisco routers, but that hasn't brought any resolution.

There's no blatantly obvious loops in the network as far as I can tell.

As for now, we currently run off of one stack without issue, but I'd like to use two since the company has paid for both.

Is there something obvious I'm missing?



No comments:

Post a Comment