Thursday, September 2, 2021

routing drops to single site across wan link

I have two buildings (A & B) that connect to our network provider WAN via 10gb links. All other buildings (15+) connect via 1GB links. Network provider equipment is not seen by my equipment as being there, just my equipment.

Simple static routes:

ip route 10.1.0.0 255.255.0.0 172.16.1.1 (building A)

ip route 10.2.0.0 255.255.0.0 172.16.1.2 (building B)

ip route 10.3.0.0 255.255.0.0 172.16.1.3 (building C)

and so on

Buildings A and B have static routes for all buildings, as they have servers/internet access that is provided to the other buildings.

Buildings C+ have three static routes, one each for A & B networks, one for 0.0.0.0 to either A or B, depending on where I want the internet traffic to exit the network

Issue pops up between buildings A & B, the 10gb links.

Buildings A & B lose the ability to directly talk to each other over their respective 10gb links.

Buildings A & B could still talk to the other buildings that have 1GB links, still using their 10gb links.

Building A could talk to building B if I routed the traffic through building C.

Rebooting the core switch at building B resolves the issue for 15/25 or so hours.

The switch was originally up for 80+ days.

No config/firmware changes made to either switch at building A or B in the weeks prior

New site/link added to network provider wan in late June without issue

Nothing jumps out in the event logs of the core switch at Buildings A and Buildings B. Basicaly nothing logged on either side prior to the random loss of connection.

These switches are different, but the current config has been in place for over 12 months with this network provider, and the switches have been in place for some 8 years or more.

Building A = Dell PowerConnect 8000 series, aka Force N4000 series

Building B = HP 5406zl

I had a second Dell switch at building A as a spare.

moved it to building B... setup WAN port like the HP 5406 was. swap fiber from 5406, 10gb link comes up/connected.

ping 10.1.0.1 or 172.16.1.1 - fails

ping 10.3.0.1 or 172.16.1.3 - GOOD..no drops.

like WTF????

switch fiber back to 5406... ping [10.1.0.1/172.16.1.1](https://10.1.0.1/172.16.1.1) \- GOOD... no drops 

switch back to Dell...failure occurs..

network provider says nothing has changed with their config/equipment.

Just waiting for the connection loss to occur tomorrow sometime, just like every day this week.

I'll provide configs if desired when I go back in the morning.

any thoughts?



No comments:

Post a Comment