Monday, September 27, 2021

Aggressive Arps / Duplicate ARP's 4500-X / HPE Enclosure / VM Blades

Hello , I have inherited an new client for my company . Switch 1/2 4500-X switch 3/4 3850

Switch 1 is the Active HSRP when times are good as such holds the VIP for the subnets there are around 60subnets with VRF's . between 6am(ish) and 6pm(ish) I notice the CPU blitz up from 10% avg to 50% average often accompanied by some OSPF Retransmissions. After grabbing a CPU capture for traffic being punted I found Switch 1 was overwhelmed with ARP requests from VM's sourced from the HPE Enclosure/ESX Blades and always with a duplicate request . This environment is a mix of VDI's / Always on Servers . Storm control is out of the question on these enclosure uplinks as to go err-disabled is a big no-no . There's also no CPP which I want to do, to guarantee cycles for OSPF/HSRP but at risk of dropping legitimate Arp traffic

Here is a small Cap I got from NSX in vSphere the device i ran the capture on was not the Device I was running a capture on . these two hosts are on the same vlan on the same dSwitch. NB this cap is not unique to the host I was monitoring in the CAP . also pay attention to the timestamps

https://pasteboard.co/IOlgr8k6G40S.png

Sysadmins have went from active / Active configuration to Active standby on the vSwitchs to stop excessive mac flaps albeit the ARP volume is still a little ridiculous

I have the ESX/sysadmins telling me there is a Loop "somewhere" but I have validated there is not , I also engaged TAC to validate this also and they are in agreement. Has anyone seen behaviour like this from HPE/ESX Blades ? (Sorry if this should be crosspost for sysadmin)

Any pointers appreciated ? as is a Whiskey or thoughts and prayers at this point :D



No comments:

Post a Comment