I am working on troubleshooting some issues between an application server and a backend database cluster. The application reports intermittent connection loss to the database, in its logs.
I would like to sanity check my understanding to make sure I am on the right path.
When I look at the Packet capture with wireshark, I see a bunch of TCP RST messages between the App Server and the DB server (~ 2% of total traffic). Each of these come in pairs. For example in the first message with the request, the App server connects to the DB server. The Ethernet Frame for that message shows me the MAC address from the outgoing interface on the App server as the Source Address and then another MAC address as the destination Address. If I am not wrong, this is the destination address of the next hop, right?
Then the IP Header on that message shows me the IP of the App server as the source address and the IP of the database server as the destination address. It has a TTL of 64. The tcpdump was captured on the App server. I am assuming the TCP dump is 64 because the packet is being captured on its way out of the box and has not yet been decremented.
Next when I look at the response message, in the IP header, I see the Source IP as the IP of the DB server and the destination IP as the IP of the App server. In the Ethernet Frame, I see the destination MAC address as the App Server's network interface, and the Source MAC address appears to be that of some networking device because the first three pairs of hex values correspond to a router/switch manufacturer. I am assuming that is the last hop before the packet gets back into the App server. This is also the message that shows the TCP RST in the TCP portion of the message. It has a TTL of 50, which means the packet successfully left the DB server and made a few hops across the network.
Traceroute is blocked so I dont know exactly how many hops it would need.
What I am trying to determine is what is causing those TCP connection resets to occur. My hunch is that there is a faulty networking device somewhere in the path that is resetting the connection. Initially I though the Source MAC address in the response message (with the TCP RST) that belonged to the networking device may be the one doing the resets and thats why its mac address is in there. But now I am thinking it is in there just because it is the last hop before it gets to the NIC of the App server. I am a little confused at this point.
Any pointers are appreciated.
No comments:
Post a Comment