Friday, November 20, 2020

Nexus 9K - VxLAN EVPN Multi-site - vPC BGW

Is anyone running Nexus 9Ks in NXOS mode with vPC BGW?
If so, I'd really like to hear about your general experience, and also your experience in two specific areas.

Info

VxLAN BGP EVPN Multi-Site seems to fit our requirements, allowing DCI over our L3VPN and also for traffic to be symmetrically routed in/out of each DC.

I'm looking to deploy two 9300s in vPC BGW mode in each of our brownfield data centres, to begin migrating them to a modern VxLAN BGP EVPN Leaf/Spine fabric. So initially, the two switches will be the BGWs, Leafs, Spines, RPs and RRs. Scaled out with separate spines (running RR/RP) and leafs later. This seems very standard looking at Ciscos Legacy DC migration slides on their Cisco Live presentations, so I expect it's a common deployment.

Query 1 - Reliability

vPC, BGW, EVPN, VxLAN, RR, RP, Lead, Boarder Leaf seems like a lot to load onto one box, has anyone had issues with reliability? Any issues with control plane failures on the various protocols?

Query 2 - Failover

Another issue is failover. Each of the two switches at each DC will have two uplinks.

  • One uplink to an L3VPN for DCI
  • One uplink for the per-vrf uplink traffic to various production L3VPN WANs.

If the per-vrf uplink fails, a route needs to be available via the switch with the working per-vrf uplink. So orphan hosts on the failed switch can reach the L3VPN WAN.

Initially, this looks simple, add "advertise-pip" in BGP on each of the switches in the vPC. So the routes learned over the per-vrf BGP peering to the L3VPN are advertised as type 5 routes into the EVPN with the physical IP of the switch, rather than the virtual IP.

From a control plane perspective this works, the type 5 routes are learned and installed on the switch with the failed uplink, but when the VxLAN encapsulated packets are forwarded to the working switch, it drops the packets. This looks to be because the source address of the packets is the VIP of the vPC, and there's some sort of split-horizon mechanism. Advertise-pip would work for other VTEPs in the network, but not the neighbouring vPC with the same VIP.

I thought this was a bug, but apparently not. Cisco's docs show this as expected behaviour, and the recommended solution is per-vrf BGP peerings or static routes between the vPC peers, each on a separate point-to-point VLAN link between switches, for each tenant. So the solution seems to be to bypass the VxLAN EVPN fabric.

This seems a bit untidy but does work in the lab. Is anyone running like this, and has failover worked as expected?



No comments:

Post a Comment