Monday, August 17, 2020

Weird network issue - intermittent timeouts

We're experiencing a very strange issue on which I can't seem to find the common factor.

Multiple locations experience this issue.

All of a sudden, a random webpage (including google.com) can not be loaded. Just for a couple of seconds.

  • they use the same DNS servers. However, it seems only 2 IP-ranges are affected
  • changing a client's IP to be in another range seems to solve those issues. But of course, this is nothing but a workaround and not something we want to do for the entire network since the actual issue remains unsolved.
  • routing seems to be the same for the affected and non-affected IP ranges on both the core switches and firewalls
  • there's no increased traffic at all. Quite the contrary, there were less people the last few weeks.
  • all PCs have the same antivirus with the same antivirus policy
  • it also doesn't seem to be the internet provider: some of the locations go out through a completely different physical route and ISP
  • there are 2 firewalls acting independently; each flow goes through only one of them. They are the same model though. Just to exclude some issues, we upgraded one to a newer version, but no change.
  • first report came in 3-4 weeks ago. No major changes were done, except one (new routers from ISP), but theoretically it can't have any influence and we also already tried to rule it out by disabling the interface. The ISP also claims they do not use those two IP ranges on their side.
  • so far, I'm not ruling out that there are internal problems with internal applications, but I also have no reports about it yet.
  • traceroute gives same results for affected and non-affected clients

I'm kind of running out of ideas on why it would be an issue of just a few seconds, completely random; and only affecting two IP ranges. I was hoping there would be ideas on how to find out what kind of issue this even is.



No comments:

Post a Comment