Thursday, February 22, 2018

Traffic going through backup IPsec tunnel even when primary tunnel is up?

Ok so weird situation. I'm also a noob when it comes to this so bare with me.

I work with a company that maintains a dual-hub DMVPN network. We have just around 300 different remote sites that connect back to ours via site-to-site vpn. At each location we have a single static IP from the ISP (different depending on location), with the ISP's modem in bridge mode. Nothing special with the modem/connection from the ISP. It's just like buying a modem off the shelf and plugging it in at home, only difference being we have a single static IP. We use VRF routing, with Tunnel0 being the main WAN Tunnel and Tunnel1 being the backup tunnel through the cellular interface. The configuration with this router is the same as every other site we manage.

Here's where it gets weird. I first noticed this issue when trying to ping devices behind the router. With every other site, any device I ping will have a consistent reply anywhere from 33-43 ms. With the problem site, the times are anywhere from 57-365.

So I start looking into it. I do a traceroute from my local computer to devices at other sites, and I can see the traffic is going through Tunnel0. At the problem site I can see traffic going through Tunnel1.

This is the part I really don't understand:

Pinging Tunnel0 from within another site's router. (Notice the ping times)

> Sending 5, 100-byte ICMP Echos to Tunnel0_IP, timeout is 2 seconds: > !!!!! > Success rate is 100 percent (5/5), round-trip min/avg/max = 44/47/52 ms 

Pinging Tunnel0 from within the problem site's router. (Notice the ping times)

> Sending 5, 100-byte ICMP Echos to Tunnel0_IP, timeout is 2 seconds: > !!!!! > Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms 

When I do a traceroute from another site's router to Tunnel0:

> Tracing the route to Tunnel0_IP > VRF info: (vrf in name/id, vrf out name/id) > 1 X.X.1.1 24 msec 24 msec 24 msec > 2 Tunnel0_IP 48 msec * 44 msec 

But when I do a traceroute from the problem's site to Tunnel0: (It completely bypasses the first hop)

>Tracing the route to Tunnel0_IP >VRF info: (vrf in name/id, vrf out name/id) >1 Tunnel0_IP 0 msec * 0 msec 

And finally, a show ip route in another router:

Gateway of last resort is Tunnel0_IP to network 0.0.0.0

But with the problem router:

Gateway of last resort is Tunnel1_IP to network 0.0.0.0

I've checked the config for the problem router multiple times and everything is the same as other routers. What could be causing this?



No comments:

Post a Comment