Monday, January 6, 2020

Tracking down a network issue.

Dear networking people of Reddit,

I have a subnet (10.123.0.0/24, 10.123.0.1 gateway) that has 3rd party equipment on it for a hosting service for our guests 10.178.0.0/24. The 3rd party is using 10.123.0.12 - 20 and has two stacked switches and several servers on it. Currently the guests are having issues connecting to the service through a local URL, the local URL to my knowledge is actually 10.123.0.16 (notheacualsite.com). When i do a traceroute to this address I will get multiple (avg 2-3) hops on this same address and it also happens on the .17 address (I don't know the relevence of .17). When doing a traceroute to any other address on that subnet I do not have any issues. I do not have any documentation on their set up, but .16 is a server and I am guessing .17 is too. These IPS are also NAT'd so that the 3rd party can get to their equipment(i believe). Communication is intermitant and the service works for a minute or so at a time and the stops working for a similar amout of time. The service is all internal to our network.

Their equipment is conneccted to ours through a port channel vlan access 123 and both channels to my knowledge are using LACP. I say "my knowledge" because we use the same equipment and these switches can only port channel using LACP.

This problem does not happen when i change my subnet to 10.234.0.0/24. I get normal traceroutes on all the ip addressess in the 123 subnet. This confuses me a little.

To me it looks like they are having issues on their side since it is only happing on two ip addresses in that subnet. I have a ticket in with them letting them know we are having issues getting to their service. I didn't tell them anything else and am waiting for a response before I give them any details. I did this mainly because I am relativley new to networking and need some advice/help with this issue before i start shooting from the hip.

Some additionall information is that I recently moved these connections from one of our older switches to a newer switch. There are no trunking/pruning differences between the two switches. Not sure if that is relevant, but it's a change that was made.

Networks are obviously hypothetical since the specific addressing doesn't really matter at this point. 10.123.0.16 is also not pingable, this is by design on their side. I don't even know if what is posted above is the issue, but its' the only thing I can find that could be causing poor communication.

Any advice or troubleshooting steps I can take to narrow this down would be greatly appreciated. Additionally if I am wasting my time let me know.

I apologize is this doesn't flow well as I was basically making notes.



No comments:

Post a Comment