Wednesday, February 12, 2020

[Troubleshooting] DHCP failure/ network dropping.

Hello All, I dont know if this is the right place to post this but after reading the subreddit rules it seems to fit here.

I have listed the hardware at the bottom of the post.

I have been racking my brain for about 3 weeks now on a client site that is completely dropping its network randomly about once a week. I am starting to bald from pulling out my hair and am reaching out to this community for any shared knowledge that may assist in remedying the issue. I will attempt to give as much detail as possible.

Client calls about the network being down and having no LAN or WAN access. I am not on-site as it is a small business and I advise to reset the modem/router until I am able to arrive and diagnose the network. After a reset of the router/gateway (Comcast Business modem) and the main switch the network comes back up. I arrive on-site and pull the logs for the incident from the modem but am not seeing anything logged for the droppage. I find this odd being that the network went down so I do some tracerts, check routes, pings to check latency and throughput and everything looks great. I advise that we will need to keep an eye on it but it may have been a hiccup and we shouldn't dive to deep into the issue until we can verify that there is a persistent problem.

Fast forward 5 days and the client calls me in the AM stating when they arrived the network went down again over night and had remained down until they reset the equipment. I immediately headed to them and upon arriving found everything to be working correctly without any issues. I advised that we should reach out to Comcast and attempt to see if they have any information or logs that we were unable to see on their equipment. Comcast advised that their equipment was running fine and there were no outages/drops. Comcast also advised replacement of the switch stating that the problem was nothing to do with their router/gateway. After discussing the issue with the client we decide to install a brand new switch in hopes that it was simply intermittently hanging.

4 days later client calls again, same issue. At this point I am starting to feel that the Comcast equipment is having an issue and we are getting the run around. Upon contacting Comcast support they attempt to tell the client that their network is having issues due to having to many devices for their modem (about 22) and they would need to upgrade to a more expensive plan and new router to remedy the issue. My client was not happy with this as they understand enough to know when they are getting the run around. Instead the client decided to purchase their own Modem and router to replace out the Comcast equipment in hopes of resolving the issue which we were feeling at this point was a issue with the Comcast router/gateway. I install and activate the new modem, setup the new Router with a standard /24 and static ips for the important stuff (AP, Server,etc).

4-5 days later at around 1:30 am the network goes down again (a Saturday/Sunday morning) Client calls on Monday to let me know the network dropped again, same story reset equipment everything is fine. I remote in and pull logs from a few machines and a NAS box and find the NAS has logs stating DHCP unreachable and then reupps with a APIPA address. I decide to write a script for the client that will test the network while it is down thinking maybe I will see where the drops are happening. Client runs the script at the next outage and it is a complete drop, as in not even able to reach the router:

Tracing route to 192.168.119.1 over a maximum of 30 hops

0 ****Desk [192.168.119.74]

1 ****Desk [192.168.119.74] reports: Destination host unreachable.

At this point we have a new Modem, Router, and Switch. I think maybe there is a intermittent power issue and we replace out the surge protector with a new UPS w/battery backups. Problem persists a few days later, once again in the middle of the night.

While doing research I came across this article but it is specific to homegroups and there are none to my knowledge in the network: https://answers.microsoft.com/en-us/windows/forum/all/w10-losing-ip-connection-to-isp-drops-dhcp-and/fdb417d0-7b10-4dd3-8393-cefa46aa392c

One last bit of info we did upgrade majority of the machines from 7 to 10 in Dec as like most clients they were wanting to wait until they had to migrate before doing so.

Here is the hardware in the network:

Netgear GS108 Gigabit desktop switch

Asus RT-ARCH13 Router

Motorola MB7420 Modem

Ubiquiti LR-AC AP

I am at a loss as it is acting as if the switch is failing but I have a hard time believing that two switches were bad in the exact same manner, especially a brand new one out of the box. Has anyone here seen issues like this before and what did you find the culprit to be? This is a small business with 5-10 people working through out the day, and I cannot find any information in any logs that helps to point to the issue.

Router logs show no droppage, Modem logs show no issues, Windows logs state DHCP unreachable on some machines but not all.

Any help or ideas would be appreciated.



No comments:

Post a Comment