Hey guys,
Like me start by saying I'm not an network expert but this situation got me scratching my head and you guys might be able to help.
The infrastructure is 4 switches (Dell 2848). 2 For the LAN and 2 for the DMZ (for redundancy. Different subnets). Each switch goes into the firewall (Watchguard) into their own NIC. (With link-aggregation).
This setup has been working fine for 2 years. Since last week we get intermittent high latency then loss of connection. This happens randomly and normally last less then a couple of minutes and come back on its own. We lose connection to both DMZ switches and all servers inside the said DMZ.
Here's what Nagios is reporting:
[09-25-2018 09:13:06] SERVICE ALERT: prd-server;Ping check;OK;SOFT;2;PING OK - Packet loss = 0%, RTA = 0.45 ms
Service Critical[09-25-2018 09:11:09] SERVICE ALERT: prd-server;Ping check;CRITICAL;SOFT;1;PING CRITICAL - Packet loss = 28%, RTA = 2206.04 ms
Service Critical[09-25-2018 09:10:52] SERVICE ALERT: prd-server;PHP Error Logs;CRITICAL;HARD;1;CRITICAL - Plugin timed out
The problem seems to only be touching both the DMZ switches and server located in the DMZ. The only thing that was changed was 3 days prior we enable Bridge Multicast Filtering and IGMP Snooping Status with Auto Learn.
Could it be a dying firewall that have trouble routing packets between the subnets ? Dying switch ? Multicast problem ?
Thanks
No comments:
Post a Comment