Tuesday, March 19, 2019

Strangest AWS stuff happening over VPN tunnels

So I have been experiencing some strange behavior with AWS and VPN tunnels.

To simplify a rather weird request from the business, we've build a tunnel to a VPC specific for 6 hosts. That is to say all traffic from my office to these 6 hosts are NAT'ed to a specific 192.168. address which on AWS is added on the route table to take the tunnel "statically"

EVERYTHING works fine, but for 5 minutes every day, traffic to one hosts drops. I've set up alerts via ping monitoring to all 6 hosts as of yesterday, but before yesterday I had it set up to two of the 6.

Pings drop to one instance, but not the other. Of course the tunnel is getting blamed, and I hear the "TUNNEL IS DOWN!" from everyone all though I am getting ping replies from a second host.

The firewall in question is a Palo Alto. I called Palo TAC, we checked logs and you can see in the "downed period" two logs to instance A that get no replies, but sometime between those two logs is a entry to instance B which has a reply.

To better explain

3:01 AM - no reply to instance A

3:03 AM - reply from instance B

3:05 AM - still no reply from instance A

There are no log entries or anything stating the tunnel went down.

I've called AWS but they are stuck at looking at the tunnel itself, not really thinking outside of the box with routing on the VPC level.

I'll probably have to set up some flow logs, but I am stuck on how this could be happening.

I will most likely this weekend just completely scratch the tunnel and try again fresh.



No comments:

Post a Comment