Hello r/networking
Our Datacenter has recently been running into problems with some extraordinary traffic. And I was hoping to find others who are experiencing these problems or something similar.
We've had a massive SAS job run, which has generated some heavy traffic.
While doing Real-Time Polling with Solarwinds, we found that a single link in a Port-Channel has been responsible for 99.5% of all discards aswell as having full link utilization, while the 3 other links have about 33% utilization.
These discards are usually during microbursts of data, but they result in huge issues as the retransmissions create latency on our storage, causing several VMs to drop their drives and malfunction.
So the question is: Do you have any recommendation on how to deal with these microbursts? My colleagues have divided themselves into two camps, either saying "That's just how L2 Port-channels will treat the traffic, sending flows, not loadbalancing the packets, we need I/O control", while others say that we should route it on L3, and that would allow us to utilize better loadbalacing on these links.
Mods: I'm sorry in advance if this breaks any rules. And while I'm certified, it's only in CCNA Routing & Switching, so I'm fairly new to this data center position. I will do my best to provide any needed information.
No comments:
Post a Comment