Friday, May 22, 2020

How do you explain upstream carrier issues to management?

https://www.news4jax.com/news/local/2020/05/21/att-blames-internet-issues-on-damaged-cable-in-jacksonville/

We are located in the NYC area, and are a Verizon customer.

Back story, Wednesday night we started getting alerts our sites (They sit behind Akamai Site Shield) were failing to be polled from our main DataCenter (Pretty much any other website on the internet was working). After a few hours of troubleshooting internally, we got out System Operation and Monitoring team to setup NetPath monitoring. Doing a NetPath to the Akamai Edge Key our DNS was resolving to, we noticed traffic dying out in Miami AT&T.

Looking at Verizon BGP looking glass in New York for our Akamai End point, the ASN path was Verizon -> AT&T -> Akamai EU -> Akamai US. On our edge Verizon edge router, I cleared BGP neighbors and shut down our interface, all traffic failed over to our secondary circuit and everything started reporting as healthy. Looking at NetPath traffic was now going Sungard -> Level3 -> Akamai EU -> Akamai US. I pretty much told everyone on the MIR (Manager Incident Response) call we lucked out, that our backup internet connection had a BGP peering other then AT&T, if it didn't the failover would of fixed nothing.

They then asked why polling was working out of our Campus DC. I explain we are running ECMP with Verizon and LightTower with IP SRC /DST hashing. The Solarwinds Poller running NetPath in our Campus happened to be taking LightTower, so it was reporting as up. Once everything reported as up in our main DataCenter, I was asked which of our Customers were effected. My response was it depends. For me as a Comcast customer looking at Comcast's BGP looking glass they have a direct peering with Akamai, I would of been fine as a customer. A customer with Verizon FIOS, would of taken the same path through the bad AT&T segment (If their DNS resolved to the same Akamai Edge Key that ours was).

I was then asked how do we make this no occur again, I said we could get more diverse carriers, but with out knowing all up stream BGP peerings for every destination on the internet it a roll of the dice. I just replied we could buy every carrier located in the US :-D

Wondering how some of you guys would handle this.



No comments:

Post a Comment