Wednesday, August 7, 2019

Cumulus VRR to good to be true?

TL;DR: Cumulus Linux's VRR just seams so simple and easy? What am I missing? Why don't all networks work like this? Why bother with VRRP/HSRP/GLBP? What are it's gotchas or limitations?

From my reading, instead of a protocol with dead timers, a master, etc, like VRRP or HSRP; VRR works on the anycast principal like this:

  • Both switches respond to every ARP request with an identical response
  • The host accepts either the first or second response (doesn't matter since they are identical)
  • The host (or downstream switch) sends traffic to either gateway depend on the L2 network (MLAG hashing, STP, etc.)
  • Whichever switch receives the traffic first accepts it and routes it on

My environment: I just got 2 new EdgeCore switches + Cumulus Linux, and am installing them as the core switches for my manufacturing campus and datacenter. I'm planning to do MLAG to each server (12 servers), MLAG to our Checkpoint firewall cluster, and MLAG to several of the IDF switches. Some other IDF switches will just have single uplinks for now.

The Cumulus switches will terminate L3 for all server and LAN vlans and will route traffic onward to the firewalls (I'm using VRF-Lite for segmentation). Any issues with using VRR like this? Being a manufacturing plant we do have random flaky devices out there and it makes me wonder whether we'll have issues with devices chocking on 2 ARP replies?



No comments:

Post a Comment