Friday, October 5, 2018

Charter outage / bonding question

Most of today there was some sort of Charter/Spectrum outage around Atlanta, but the symptoms are kind of baffling.

Site J only has Charter, and has IPSec VPNs to sites D, C, and V. J is running pfSense - we upgraded it through several versions. The other sites are a mixture of pfSense versions and SRX versions. Until about 4AM this morning everything has worked for at least a year.

VPN from J to D was not interrupted at all.

VPN from J to V passed data until it needed to re-key, and then died. Packet captures show that IKE sent from V was not arriving at J.

VPN from J to C passed IKE and was able to re-key, but no data would flow. Packet captures show that ESP sent from C was not arriving at J, but IKE was.

So basically it was like Charter was filtering inbound traffic, in a random but consistent way - like some sort of header-based hash. Pings and tests from various other locations behaved similarly - some would work, some would not, but the same test always passed or always failed.

Is there some sort of LACP-like WAN link used at ISPs that can't recover if one of its' paths has failed? At the end of the day there's nothing to do but wait for Charter to fix it, but I'd like to understand what's happening.



No comments:

Post a Comment