Monday, December 14, 2020

Odd OSPF failure

We are running a Silver Peak SD-WAN for our remote sites. Our two hub-sites each have an HA-pair of Silver Peaks that are in OSPF Area 0 along with the core routers at each site.

Site B's primary Silver Peak stopped communicating with The SP Cloud Portal and Orchestrator, and then eventually completely over the WAN interfaces.

The secondary took over - mostly. There was about 10% of the routes that The Primary SP was still clinging onto somehow. The route table in Site B's core router showed it was forwarding 90% of what it should be to the Secondary. But there was a small handful of subnets at Site A that Site B's core router was forwarding to the Primary for some reason.

We could ping the LAN-side interface of the Primary at Site B, but that's about as responsive as it got. Once I shutdown the switchport connecting to the primary Silver Peak to the LAN at Site B, then Site B's core router route table corrected and started forwarding those last few subnets to the Secondary Silver Peak.

Any idea how something like that could happen? I thought with OSPF being a link-state protocol that it should fail over all or none of those routes.



No comments:

Post a Comment