Saturday, June 13, 2020

Help troubleshoot partial collapse or routing issue on a VPN tunnel?

  • We have a site-to-site VPN to a server at Rackspace that is running running a web app. This tunnel operates under the 192.168.1.0/22 subnet.

  • On the other side of that tunnel is an office location with a Cisco Meraki appliance. The local network at the office is using a 10.0.2.0/24 subnet.

  • The Meraki is also running a client VPN (using the 10.0.3.0/24 subnet) for external employees to access the LAN.

So here's the problem. Randomly, the machines in the office (10.0.2.x) will lose the ability to communicate to the server at Rackspace over the site-to-site VPN ... BUT, clients using the office VPN (10.0.3.x) continue working properly. At least for a while. Restarting the site-to-site tunnel fixes the problem.

This appears to be related to traffic coming from both the office subnets (10.0.2.0 and 10.0.3.0) because it never happens outside of business hours (when only the client VPN would be used). And only happens when I have a mixed load coming from inside the office and through the client VPN. Furthermore, it only happens once or twice a day.

Anyone take a guess as to what might be failing, and where? Or what I might want to filter for in terms of a Wireshark? As I mentioned this happens during business hours so I'm sensitive to how much time I have to troubleshoot vs. getting the tunnel reestablished so people can work.

Any thoughts are appreciated.



No comments:

Post a Comment