Thursday, June 17, 2021

Lost connectivity to servers, both physical and VMs

Hi everyone

I've posted this a while back but it got taken down for low quality

My apologies for that. I've made a serious effort to do more troubleshooting and hopefully it'll be more clear to someone who's more knowledgeable how to fix this

The context is this : we have 4 physical Hosts that we used for our old ERP. They each have 1 or 2 VMs running on them for specific parts of this software.These hosts as well as their VMs are on their own VLAN. (the network is open wide VLAN-wise though, no ACL rules for accessing them from a different VLAN)

One of these hosts and its VM has no connectivity issues apart from its iDRAC web interface being inaccessible for some reason. It can be pinged, both the physical host and the VM on it.

The other 3 however, cannot be pinged from the outside (neither the physical hosts or the VMs on them). But the VMs within these 3 hosts can be pinged from within the same LAN segment, more specifically from other VMs on the same subnet. Their default gateway however, is not accessible.They also cannot ping the physical host they are running on, although those are also in the same subnet and VLAN

The default gateway for these problematic hosts can be pinged from literally any device outside

My guess is that the problem must be from the networking settings on HyperV for these VMs. The VMs that do NOT work are configured with Virtual Switch Tagging, and their physical hosts are connected up the the physical switch through trunk ports.

The one working Physical host and its working VM are configured with External Switch Tagging, and are connected the the physical switch through access ports

One more detail that may or may not be relevant is that the traffic link lights for the ethernet ports on the NON working physical hosts are flashing like crazy, as well as the link lights associated with them on the physical switch. The working host just has the solid green light for connection up and no traffic light (except of course when I'm pinging it or RDP-ing into it). This may sound like a broadcast storm or loop but I doubt thats the case since the switch just goes back to the main core switch. There is no mesh going on, but more of a star physical topology for our network.

The problem is that I inherited this whole set up from someone much more experienced than I, and he had already left before I even got here. Nothing was left explaining how/why everything is set up as it is and there is no documentation for anything IT related (except for passwords to various systems).

If you guys have no suggestions, I will attempt to connect to the non-communicating hosts directly through the console or with a separate monitor, and change the HyperV network settings to match the ones on the working host. But because I have no idea why this was done the way its done, I cant tell if this wont break something else, so I'm putting that off until I have no ideas left.

Also, since these worked until fairly recently (4 weeks ago), I dont think it should be the configuration that messes things up,as this has not been touched or fiddled with by anyone.

It also doesnt help that my colleagues only told me 3 weeks after the servers stopped working that this happened (we only use these occasionally for archival reasons, our new ERP is in the cloud), so if something that happened during then caused it I cant remember it now.

Thanks everyone and I hope this is enough info for troubleshooting.



No comments:

Post a Comment