I'm having strange issues with several Nexus 9Ks (all are running NXOS: version 7.0(3)I7(3) and ntp feature is disabled). None of them are dropping the packets due to copp (at least according to the copp stats) and anyway, we are going VM (on host connected to 9K) to another VM hosted on a different host on a different 9K via some intervening routers or the Internet.
In all cases that we've identified... we have the "default router" IP hosted on the 9K either in the VLAN
like
int vlan 10
ip address 1.2.3.4/24
or
via hsrp in the vlan.
The ports are set as switchport access vlan with no other settings.
So we will see packet loss (between hosts) can be as high as 30% and as low as 0-1% for no rhyme or reason. The ports (10G copper access ports) are uncongested (<300mb/s) and the uplinks are 10G or 40G optical also under 1G each. Packets are not dropped like 5 in a row... or every 10th... its like 70-80 then drop then 20 then drop then random amounts in between... depending on whether we are seeing high or low losses.
The hosts are Linux/Ubuntu HP or Dell. We are trying an experiment where we move the default router from the N9K to a Cisco 6500 in the same VLAN and that seems to have removed the packet loss, at least for now.
We didn't originally, but we've just added:
no ip proxy-arp
no ip unreach to the vlan config which the 6500 already had... but haven't tested further.
If I'm missing details, please let me know. These N9Ks aren't doing much more than replacing 6500s doing aggregation, they run OSPF, BFD and some VLANs. Some are default MTU and some are jumbo MTU, but since OSPF is up, we are assured our MTU sizes are correct.
In any case, our pings are never over 1500 bytes.
We've swapped optics, we've swapped ports, we've compared other switches, the problems don't seem to mitigate for long... moving the L3 port to the 6500 is a new attempt to isolate whether its an uplink or a switch / software issue.
Would very much appreciate suggestions of where to look, or hey, there is a bug that covers this that you missed, or this is a known issue because you assumed setting XXX in IOS was default in NXOS... etc.
Thanks!
No comments:
Post a Comment