Saturday, April 6, 2019

BGP for VIP advertisement (Load Balancer)

Hi all,

In our datacenter, we've implemented a subnet-per-rack architecture (default gateway resides on ToR switch), mainly in order to avoid large broadcast domains and spanning tree issues. We use EVPN for some special cases which require a stretched L2, but would rather avoid this at all costs as it is not only complicated but doesn't really eliminate all L2 issues.

The issue is load balancers (and high availability, or VIP advertising in general) - since there's no shared broadcast domain or subnet, normal VIP advertising via VRRP etc. does not work. This has brought me to the thought of using BGP to advertise VIP addresses - servers peering with ToR switch and advertising a VIP that is in a different subnet, with the next hop being the server itself (I believe Google's Maglev and Facebook's Katran use this, Calico sort of does the same for Kubernetes).

First question, networking redditors - are you aware of such solution that implements such architecture?

This architecture poses some challenges and issues. From the top of my head:

  1. BGP sessions design. You obviously don't want your ToRs to peer with all servers (because it's a real mess to manage and automate, and because it creates issues with VM migrations). This implies a sort of centralized control plane (i.e. a single point of BGP peering between the fabric and a server which represents all load balancers. This is somehow similar to OpenStack Neutron's BGP speaker topology).
  2. Security. What prevents one server from advertising the VIP of other servers? Does the automation of this solution configure route-maps (or any other sort of routing policy) on the switches?
  3. How to create the actual interface holding the VIP (I was thinking about a loopback or Linux bridge with routing enabled on the server itself, but this seems a bit unconventional)
  4. Given the single point of peering from point 1, the obvious following challenges: which component probes for all load balancers/VIP holders in order to make sure they are up? pros and cons of centralized architecture vs a distributed one? how to make the centralized component highly available etc.

I'll be glad to hear your thoughts, ideas, implementation tips etc.

Cheers.



No comments:

Post a Comment