Tuesday, February 9, 2021

BGP (1M routes and beyond) Hardware/Software recommendations (sensible at scale!) -- and a little rant about IPInfusion OcNOS

First, I apologize for any reddit rules I should know. I've done a search on this topic here and don't believe I'm violating anything with the question or am ill-prepared on easily-googled answers. Thanks in advance!

The Internet is quickly closing in on 1M IPv4 routes and is already over 100K IPv6 routes (~200K IPv4 slots) which means a lot of 1M and 1.3M platforms are going to be trash. We used to use Cisco 6500/3BXLs and briefly flirted with Sup2Ts but found (the hard way) that their TCAM usage does not scale predictably or really ever achieve their data sheet specs. They make a tiny note of this in the release docs, but ... well, you only find that out after you've migrated the whole network over.

So hey, there is all this wild and crazy new Open Source Hardware and Open Source Networking Software....

We were very excited about the massive routing potential of the Broadcom Qumran w/ extended TCAM and bought a few chassises and had to go with IPInFusion OcNOS because of MPLS support.

At first, IPI was very friendly if a bit slow at addressing bugs and things. They had promised that they had a lot of experience with large BGP views from multiple SPs, etc, but it quickly became apparent that a fully-meshed BGP network of routers with many full and partial peers and IPv4/IPv6 dual stack was beyond their experience. Ok, no big deal -- we knew what we were getting into, a little hair of the dog.

However, we have been informed that they no longer even have a plan to support BGP over tunnel... which kind of kills any hope of fully-meshed BGP networking over anything that a single data center. So much for being an up & coming option vs Cisco or Juniper.

The hardware from Broadcom (via Dell and Edgecore) is fantastic though.

One way to work around the limits of OcNOS... if one were to consider that an option, would be to put a L2 or L2 + tunneling platform or a non-fancy but fast L3 platform in front of the inter-facility connections and run tunnels or VXLans or MPLS Pseudowires to allow them to think they are directly connected even if they aren't. [Option 1]

Another option would be to throw them out and replace them with a single platform that has "reasonable" features. I think those include OSPFv2/OSPFv3 including full IPv4/IPv6 full stack capabilities], MPLS (including over OSPF), BFD, BGP (over 2M IPv4 routes or equivalent), some kind of tunneling between platforms (can be GRE, MPLS, Psuedowire, IPSec, whatever) will allow BGP to take a packet in and send it on to its exit router without worrying about the intervening hops, wire-speed 10G/40G/100G. [Option 2]

In 2021, I don't know that this is a crazy needs list or anything. Openswitch would be great if someone validated its MPLS support (I see a patch at the end of 2019 for L2 MPLS tunnels, but nothing much after that).

Having used Cisco for so long, the first idea was to go to the Cisco version of the Qumran chipset [aka the NCS] -- but the licensing/entitlement structure based on what I've read on the Cisco website is far from clear... it looks like you have to enable groups of 100G at a time, and then there are other features and other licenses that may need to be added. Utilizing a vendor or a Cisco rep for a small order is such a PIA, I'm here to avoid that pain and 3 week exercise to get bad answers. If this is a viable option, I'm sure someone has experience with it. The NCS does NOT seem to be well received and people seem to just go with the ASRs. I don't know which ASRs to look at. I'd like to pick up a couple of platforms on Ebay to test and prove out in the real world before getting burned again.

There is more I could talk about in terms of research into platforms, but I'm hoping someone will give me a quick spiritual kick to the head and point me in the right direction.

Is there a great platform that has sensible units on Ebay I can try out for this set of needs (equivalent to what the 6500 represented in its heyday -- just worked, maybe a few quirks, but rock solid after that) -- from anyone, Cisco/Juniper or even like an Arista or [long list of names that say they do BGP]

Is an open(er) soluton like Exaware an option?

Is there a dumb platform that would meet the requirements of Option 1 and we just do a two-layer solution?

Spiritual kicks welcome, thank you for your time!



No comments:

Post a Comment