Tuesday, January 12, 2021

Random link flapping issues on some switch ports. How can i troubleshoot this mess?

Configuration:

  • 3 stacked HP Switches (MAIN)
  • 2 other HP Switches in the same network closet connected together with a trunk ethernet (PRODUCTION)
  • MAIN and PRODUCTION are connected through a fiber trunk link
  • STP is enabled

Issue:

A new machine has been installed and connected to one of the PRODUCTION switches, after few days of tests the machine technician complained that our network seems not really stable.

Investigation:

So we checked the logs of the HP switches and found out many "port status change" events with this kind of pattern:

I 01/11/21 13:08:02 00076 ports: ST1-CMDR: port 3/26 is now on-line I 01/11/21 13:08:53 00077 ports: ST1-CMDR: port 3/26 is now off-line I 01/11/21 13:08:57 00076 ports: ST1-CMDR: port 3/26 is now on-line I 01/11/21 13:08:58 00077 ports: ST1-CMDR: port 3/26 is now off-line I 01/11/21 13:09:01 00076 ports: ST1-CMDR: port 3/26 is now on-line W 01/11/21 13:09:01 02672 FFI: ST1-CMDR: port 3/26-Excessive link state transitions 

We collected all the logs in one Excel spreadsheet and realized that:

  • These events happens pretty randomly in all the switches
  • Some days we have hundreds of events like these and others we have only few of them, also when the company is not working we have none (surprise?)
  • Some ports are more affected than others, we even made a chart

Some of the affected hosts are Windows computers so we tried to check for "link loss" events in Event Viewer but what's weird is that most of the times there were no warnings, so the port in the switch turned off for a bit but for the computer the link was still ok.

So it seems like we have found out this problem only now because we connected a device who is more sensible to these kind of issues.

How can we troubleshoot this?



No comments:

Post a Comment