Wednesday, August 15, 2018

WARNING: New Spectrum BGP "Standards"

Just got off the phone with Spectrum/Charter/TWC/Brighthouse/Whatever they are now. Our BGP with them went down Tuesday at precisely 1AM. Sounds fishy? While you would prefer perfectly stable connections, it's pretty standard (in my experience) to have middle of the night random drops as providers perform maintenances without sending notifications. How professional! The exact timing is a dead giveaway.

My colleague (he wants me to refer to him here as Chuck Finley) opened a ticket, and was immediately told it was a fiber cut. Great! Update us as it gets fixed.

No updates throughout the day, and Chuck calls back. Now he's told it was an equipment migration. Super, fix it.

We start escalating with account managers and breathing fire. Chuck finds this in the logs:

%BGP-3-NOTIFICATION: sent to neighbor 192.0.2.1 active 2/2 (peer in wrong AS) 2 bytes 4E21

Yup, they botched their config.

He gets on the phone with them and gets them to fix this. BGP neighborship comes up, we get our default route, but our outbound advertisements are still not being preferred over our backup that we prepend 6 freakin times. Still escalating with account managers, who basically say "we're going home for the night, good luck!"

This morning Chuck finds that we are no longer even receiving the default route, 0 prefixes received. le sigh.

Calls them up yet again, and is told somehow they stopped giving us default and gave us Full Routes. We filter everything but default inbound. They put it back to default and we're up and running for outbound traffic, but route advertisements to them are still borked. Chuck goes through all the config and asks me to hop on a conference call and double check. I confirm the config is good on our end.

The Spectrum engineer says he's getting our routes prepended 3 times with 100 local preference. That's odd, since our route-map to him just matches on our prefixes and doesn't set anything. The only route-map that prepends 3 times also sets the local preference lower via communities. Our config hasn't changed since the BGP relationship bounced multiple times, so it's not like some latent config is stuck in the works. Just to humor him, I hard reset the BGP peering, and he claims the prepends went away. OK fine, still has nothing to do with not preferring that route over a 6x prepend that goes through 2 other ASes. While talking about that 6x prepend route he lets slip that the local pref on that route is 101.

WHAT?

It clicks that our local pref is only 100. I pull up my 'Charter BGP guide' (probably old/legacy, but most providers are relatively consistent with local preference communities). 120 is default for customer routes, 100 for peers, 80 for transit. He starts explaining about the new config standard they are pushing blah blah blah. He even gets someone from the Standards team on the line. I start questioning about why they are defaulting us to 100 and why, since local pref is significant within the AS, they are assigning our routes from transits to 101. Blah blah new standards. I ask for their new BGP guide. They have none, he's going to bring it up to the team and see if they can write something. Gotta wait 2 weeks and ask my account manager. He asks if either we can set 120 local pref via communities or he can have it hard coded. I'm happy to set it and do, then soft reset. Symptoms go away. Now I get to wait and bring it up over and over again until they actually fix their broken standards.

TLDR:

Once you're on the 'new standards' Spectrum will now by default prefer ANY OTHER PATH to your routes, even if it goes from Slovakia to China to Russia to South Africa, then back to you over 92 AS hops rather than going over your direct fiber link with them. Maybe I'm overreacting, but I feel like they just broke basic BGP.



No comments:

Post a Comment