Monday, November 25, 2019

Aruba 8325 - VmWare - Windows Server - ECN problems

Networkers!

We have just deployed 4 new Aruba 8325 switches in our data center, and we are facing some issues.

Here is the setup:

2 x 8325 switches in Computer Room 1 - connected with a 2 x 100G LAG (running VSX)
2 x 8325 switches in Computer Room 2 - connected with a 2 x 100G LAG (running VSX)

The "switch clusters" in Computer Room 1 & 2 is connected with 4 x 10G interfaces, bundled together in one LAG.

In the Computer Room 1 we have multiple ESX(VmWare hosts redundantly connected to the two 8325s.
They are connected with 25G interfaces (DAC).

Problem

What we have found is that moving data between the Windows Server guests (Windows Server 2012+) is very slow (20-50MB/s). Moving data between the Linux servers is as fast as expected.
vSAN and vMotion is also running without any problems.

Previously the same ESX/VmWare hosts were connected to old 5412 switchs with 10G interfaces.
We never had performance issues in that environment.

Our VmWare team has been troubleshooting all weekend with WmWare, Dell, Microsoft and Mellanox.
So far it seems the the slow speed between the Windows servers is caused by the ECN feature.
If we disable ECN on the Windows Servers, the speed is as expected.

We have contacted Aruba support and asked them to investigate this, and they are currently analyzing our logs.
I can also see that it's possible to create some kind of ACL to detect ECN in the 8325 switches.
Not really sure how that works. It seems that it might be used together with QoS in some way...
Or maybe it can be used to "strip" the ECN info.

Interested to hear if any of you have had similar problems on this (or other) switching platforms.
And of course if you have found any solutions (other than disabling ECN on the Windows Server guest).

/Kenneth



No comments:

Post a Comment