Wednesday, July 31, 2019

Issues with VXLAN + live migration in Linux

Long post. Thanks in advance for reading!

I have an all layer 3, ECMP underlay network with routing on the hosts. The hosts are Linux machines running Proxmox (qemu/kvm) and have vxlan interfaces defined, without any control plane - the bridge fdb table is statically populated with the other VTEP addresses. Each VXLAN interface is attached to a bridge (regular Linux bridges, not OVS), and VMs get attached to the bridges.

When I live migrate a VM, it (mostly) loses its network connection. Here's what I've observed so far:

  • Pinging VMs attached to the same VXLAN, but living on different hosts, works intermittently. I get a reply for maybe 1/3-1/4 of the requests sent.

  • tcpdump on both hosts shows replies exiting the ping target just fine, but the migrated VM's host never seems to receive some of them.

  • The VM cannot ping the gateway at all. It never receives the ARP reply sent by the gateway. The gateway is also a member of the VXLAN, but what's different about it, is it's got 4x ECMP routes into the underlay rather than 2 like everything else, and it's a Fortigate.

  • On all hosts and the gateway, I've checked the bridge fdb table after the migration, and can confirm that the new host sent the gratuitous ARP, and all hosts know that MAC now lives on a new host.

  • For good measure, I've run captures on the old host as well, to make sure no traffic for the VM is still arriving there. There is none.

  • To keep a long story short, I discovered that forcing an OSPF change/route reconvergence on the network fixes it. Then, to dig into that a bit further, I did another migration, started a ping from the migrated VM to a VM on another host (B), and on host B, flushed the route cache (ip route flush cache). This cleared up the problem for all VMs on host B, but no other hosts - including the gateway.

  • We use keepalived to float VIPs between servers on different hosts with VRRP, and that works perfectly, never had an issue with it.

Now, after digging a little, it seems like route caching was removed from the Linux kernel, I think in 3.6. so I'm not sure why flushing the route cache solves the problem. In any case, I'm extremely confused. Anyone here have ideas as to what's going on?



No comments:

Post a Comment