Thursday, November 29, 2018

VPN Tunnel Goes Down During IP Sec (Phase 2) Auto Re-negotiation

Hello all, I hope this question is acceptable here.

We have a SonicWALL NSA 2600 at main site and a SonicWALL FV-400 at remote site.

A site-to-site VPN tunnel between them had been working flawlessly for about 2 years.

Approximately one month ago, we began having an issue where the tunnel would go down, at around the same time of day everyday, and then it would magically heal itself and come back online in about 15 minutes.

After a few days, we realized that it went down at the exact same time that IP Sec (Phase 2) was set to auto re-negotiate. That meant we could now predict precisely when it would happen, but we still didn't know the cause. It didn't make sense that the tunnel worked for 2 years straight and then randomly started having this issue.

We have SonicWALL support on both devices, but they are absolutely useless and clueless about their own equipment.

I've had them make minor tweaks here or there on the tunnel settings, but they were all guesses and never fixed the issue.

I've called 5 times now to get this figured out and each time, they make minor tweaks and tell me to call in again if it keeps happening. Well, it keeps happening, and it's severely impacting our business now, not to mention my reputation with our business owner. It baffles me that after a month, SonicWALL is still in the "guessing stage" with our issue.

They've done numerous packet captures and log exports, etc... They have NO CLUE what's happening. I was at least able to be escalated to a senior level tech today, but he and I spent 2 hours on the phone today and he still didn't know what was going on.

One thing we DID discover today, however, is that when we change IP Sec proposal protocol from ESP to AH, it instantly starts working. But when it's on ESP, it takes 15-45 minutes to start working. During that time, absolutely no traffic passes between the two subnets, however both SonicWALLs show a green light indicating that the tunnel is up.

I've been using ESP protocol for 2 years just fine.

Why would it suddenly start exhibiting this behavior, seemingly out of nowhere?

No changes were made to either router when the issue presented itself. However, since the issue began, we have updated to latest firmware on both sides in an attempt to resolve the issue. No luck.

Also, I have TZ-105 at my house with an ESP-based tunnel going to both sites, and that tunnel is rock solid. Stays up at all times. So when SonicWALL tried to suggest that perhaps AT&T was blocking ESP, I was able to refute that because;

1) It worked as ESP for 2 years. Why would AT&T suddenly start blocking that with no warning?

2) If ESP was suddenly being blocked by ISP, it wouldn't start working again after 15-45 minutes post re-negotiation. It just wouldn't work, period.

3) If ESP were being blocked, the ESP tunnels I have at home, going to both the remote site and main site, would have to be affected too, right?

So for now, I have put a band-aid on the problem, by setting my negotiation between remote and main site to occur every 24 hours at midnight (8 hours is default), so the issue still exists, but no one's in the office to notice it. It's bugging the hell out of me and am open to suggestions since SonicWALL is utterly useless.

Thanks everyone!



No comments:

Post a Comment