Monday, July 20, 2020

Help with a service design approach that requires multiple routing tables

Hello, redditors

We have a pretty basic edge, core/distribution and access DC network, basically we have a core/dist switch that works as the gateway and then lots of L2 trunks towards the racks where we terminate the VLANs via access switches. We also have a couple of routers working as edge.

Old “easy” approach, that works for us because we really only use around 50 VLANs or so at the most and our Mac/Arp entry count is well below the limits of the core/dist switch.

We have 2x transit providers (full BGP tables), 1x connection to an IXP, plus a scrubbing provider for keeping us up in case of DDoS, let’s call this the “Common Services” or CS.

There’s a connection to a third provider, let’s call this one “Special Routes” or SR

We have around 30 subnets, we add and remove from time to time, the subnets are divided into two groups, “common customers”, “special customers”.

We also have some internal peering to servers that inject routes based on conditions like DDoS (inject either blackhole route or a special route for scrubbing).

We basically need a way to accomplish the following:

  1. Internal prefixes should talk to each other without issues through the core/dist

  2. “special customers” subnets should be able to use CS+SR inbound and outbound

  3. “common customers” subnets should be able to only use CS inbound and outbound

  4. We must keep the capacity to inject the ddos routes from the servers to either tier of service

The inbound part is quite easy with BGP policies (which is what I do atm, don’t broadcast here or there based on communities).

The outbound part, however, is driving me nuts, because the CS and SR have routes in common, the difference is, SR routes are waaay lower latency (and waaay more expensive). This means, that the “special customers” subnets must have a routing table that uses SR for some routes and CS for other routes and common customers should use a table that only has CS routes.

We have Cisco routers and switches (ASR1001X and Catalyst 3850), but can change them to other brands, actually I did some sort of spaguetti with mikrotik since I didn’t manage to get it done with the Cisco, what I did was use a function of those routers where you can make virtual tables and then make traffic go through all of that virtual table and if it doesn’t find a match to use the global default table, but this solution is quite odd to me at least, don’t see how it will scale, the CPU of those device went quite high since I did this and we would rather stick to Cisco for now (for other reasons).

I’ve though about VRF and VRF leaking of some sort but I am not completely sure about the approach I’ve come with some ideas but always look like quite complex or hard to maintain at least based on our scenario.

I hope this was clear, any ideas would be really appreciated!



No comments:

Post a Comment