Monday, July 1, 2019

2960X Stack Upgrade Issue

BACKGROUND

We have begun updating our 2960X switches from 15.2.2E7 to 15.2.4E7. The main trigger was that we wanted to be able to run test cable-diagnostic tdr on 1 Gbps ports that were running at 100 Mbps, and at the time, 15.2.4E7 was the latest TAC star release (I realize that E8 is now available, but we already had E7 tested in production and did not run into issues with it, so we opted to stick with E7 rather than re-test on E8).

Using the SWIM features on Prime Infrastructure 3.4.1, Update 2, we distributed the .bin files to our standalone switches and the .tar files to our stack switches and made them the bootable image. Then over the course of several days, began activating those images (SWIM process for rebooting the switches so they load into the new image). We do it in phases over the course of 5 or so nights, with around 10-15 switches in each phase. That way, if we run into issues, we don't have a buttload of problems to take care of (there's only two of us doing Networking). That's also the reason we test the code on one of our stacks for a month or more before we deploy it to the rest of our switches.

ISSUE DESCRIPTION

So far, out of 18 switch stacks, we've had 3 different stacks run into the following issue:

The switch stack will reboot and load the new code. The Master switch will update without issue. The member switch (each stack that has had this issue so far has been a stack of 2 switches) will update, but when it boots, it fails ACT2 Authentication during POST and spits out the following log message:

%ILET-1-AUTHENTICATION_FAIL: This Switch may not have been manufactured by Cisco or with Cisco's authorization. This product may contain software that was copied in violation of Cisco's license terms. If your use of this product is the cause of a support issue, Cisco may deny operation of the product, support under your warranty or under a Cisco technical support program such as Smartnet. Please contact Cisco's Technical Assistance Center for more information.

Once it finishes booting, the member switch will no longer stack with the Master. It should be noted that all of these switches are authentic Cisco switches - we only order from a Cisco-Authorized VAR (ConvergeOne, in this case). Luckily, we keep several spares available in case we need to replace a failed switch, so we haven't had any extended outages due to this issue. We've tried doing several hard resets by physically removing the power cable and plugging it back in, but still get %ILET-1-AUTHENTICATION_FAIL on boot. We tried upgrading to the very latest IOS train of 15.2.7E0a, hoping that the bootloader upgrade would fix the problem, but it did not. All of our troubleshooting was done with the stacking module removed and all cabling removed (except for the console cable).

It appears to be some iteration of CSCur56395, which was supposedly fixed in 15.2.2E2. Since we were already on 15.2.2E7, and upgraded to 15.2.4E7, it was either not actually fixed, or is a different issue. So far, it has not impacted our standalone switches, only our member switches in a stack of two, but not every 2960X stack of two switches has been affected.

RESOLUTION

At the moment, our only option to correct this issue has been to call TAC and open a Warranty RMA on the affected switches and get them replaced. This thread appears very similar to the issue we're seeing:
https://community.cisco.com/t5/switching/ilet-1-authentication-fail-on-2960x-after-ios-update/td-p/3682974



No comments:

Post a Comment