Friday, November 17, 2017

Sanity Check: Seeing some strange issues with delays on inbound tcp packets.

A little back story. We started noticing an issue yesterday with our VOIP provider. Calls not establishing audio from us to our customers. We also started seeing issues with streaming data via tcp from our proprietary devices in the field. Using wireshark I was able to capture some good examples of this, but not really able to narrow down anything with the RingCentral issues. Just wanted to get y'alls opinion on potential causes.

A 100,000 ft view of our process is we have two devices that connect to a server to transmit data from our application to the customer. They link themselves through this connection server via an application that sends data via TCP. Outbound transmissions are fine, but inbound form the server to devices on our side are seeing 2+ second delays periodically that you can see on Timesequence in wireshark. Looks like a staircase. This to me would seem to indicate a problem with the application, but when combined with the intermittent RingCentral audio issues I find it hard to believe it is just a coincidence.

Our provider found no issues on our primary circuit, but we have failed over to our backup. I am currently waiting for our Dev's to test and try to recreate the issue. I have not been able to find anything in our networking environments that would explain this issue. We are not doing any QoS on any of the network gear that are part of the Server & device networks. We are doing QoS on the gear that transmit the VOIP connections.

I have checked all of our network monitoring and have found zero problems. I checked interfaces and found no issues. The amount of data actually transmitted by our devices is very small, but the packet count can be pretty high. No link is at more than 20% utilization except for our interfaces connected to our ISPs which sit around 30% to 40% utilization at peak times. Am I missing something obvious?



No comments:

Post a Comment