Wednesday, October 6, 2021

Help Understanding an Outage

Had an outage caused by a device that effected every single device connected to our WAN switching in a vlan for provider 1. The other provider with devices connected in another vlan functioned fine.

Ultimately rebooting this device fixed the problem. However, I'm a bit miffed at what the issue actually was and was curious what thoughts what others may have and/or ways to mitigate this off possible.

I added the flair for switching because I'm assuming it's a L2 problem.

More detail:

Connectivity:

  • Cisco wan switches running vPC

  • Provider 1 connected in vlan via upstream HSRP routers as the gateway

  • provider 2 connected in different vlan via upstream router as gateway

  • an HA pair of SDWAN appliances running VRRP are connected to provider 1 and provider 2 in each vlan

  • other edge devices such as firewalls in HA connected to each provider

Issue:

  • all devices connected to provider 1 started experiencing packet loss ~80%, latency, and all of a sudden unable to ping 8.8.8.8

  • all devices that have a connection to provider 2 had no issues using this path.

To me, this indicated wan switching was probably ok. We've have had some failures before so it was a concern. I checked critical interfaces for any errors that could show a problem. Issues appeared to point to an upstream issue with provider 1. Ticket opened with provider 1 and they aren't seeing any issues or have any other reports of issues.

We noticed some oddities in the SDWAN appliances in which the tunnels were bouncing on both appliances on both circuits even though provider 2 was functioning fine. We also noticed errors in HA Status dropping in and out.

I decided to reboot the SDWAN appliances as it was effecting some of our troubleshooting and magically after reboot everything started working. Packet loss and latency went away, ping to 8.8.8.8 immediately came back responding, and all other edge devices began to operate normally again.

Opened ticket with SDWAN vendor for analysis on their end and waiting



No comments:

Post a Comment