Tuesday, July 7, 2020

Odd network disconnect issues on corporate network, I think I know what's wrong but don't know how to explain it to higher-ups.

Hi all,

I've been trying to track down a very infrequent network-based issue, and we all know those are the worst type of problems.

We have two buildings (A and B) on a single subnet bridged by a Cambium wireless PtP device. The router/firewall/WAN connection is all located in Building A. Users in Building B are complaining of file access issues to a server in Building B. The main symptom reported is an Access database (on Building B's server) left open on any computer in Building B throws a random error "unable to access file, network problem" (paraphrased) after "a while".

The PtP device mentioned above I have noticed over the past year or so has slowly been lowering in signal level (makes sense with growing trees/new housing developments in close proximity). The wireless statistics page shows there's at least 10% "retransmitted packets" over the wireless link between buildings B & A after resetting at night and checking the next morning.

To me, the above mentioned issue makes the PtP suspect numero uno for the network cutout problems - however, higher-ups are telling me that since Building B is having problems with a server at Building B, that the wireless link has nothing to do with it and it must be a configuration on the file server or one of the switches. I've looked over as much as I think I can (server logs, switchport logs/statistics) and I haven't identified anything else that could cause issues.

From what I understand, if your devices already have IPs (through cached DHCP or statically assigned), you can still connect and communicate with some devices across the layer 2 network if your layer 3 access drops. However, if we're working with protocols at higher levels (DNS, AD, etc.), random packet loss can cause temporary connection problems to those services since they're relying on I guess what you could call "keep-alive" packets in a way. Am I incorrect in this assumption?

I guess what I'm asking is: I'm 99.9% sure I know what the problem is, however I'm getting some pushback to look for other issues since others are claiming my theory is incorrect or needs more data to back it up. Any recommendations? Maybe I'm completely wrong that the wireless link is the issue? Should I be looking at other devices on the network first before I make the claim that the PtP is to blame exclusively?

Thank you everybody :)



No comments:

Post a Comment