Thursday, November 28, 2019

X710-DA2 - one port suddenly not working - BIOS says "The driver failed to load because an unsupported module type was detected. Message code: 10696053409972224"

I have a 2U4N SuperMicro server that I'm using to setup a Ceph cluster. I have a single Intel X710-DA2 (Dell-branded) installed in each of the 4 nodes.

I ran Geekbench 5 on one node, in order to put the CPU through its paces.

However, partway through, the machine appeared to abruptly shutdown.

Afterwards, when I tried to boot it up via the SuperMicro IPMI, it said:

> Performing power action failed. Please check.

https://i.imgur.com/If8zRgr.png

Has anybody seen this error message on a SuperMicro system before?

Anybody, I unplugged that node, and waited a bit, plugged it back in.

I tried to boot it up again, and this time, it said:

> Intel(R) 40GbE 1.7.19 is Unhealthy

https://i.imgur.com/FeV3OGk.png

Then, when I went into BIOS setup, it said:

> The driver failed to load because an unsupported module type was detected. Message code: 10696053409972224

https://i.imgur.com/v7jzK3l.png

I then tried to boot it up again and it did boot - but I seem to have lost network connectivity from one of the two SFP+ ports.

I tried swapping the SFP+ optic in that port, in case it was that - however, still no connectivity.

I then even tried swapping to another new X710-DA2 - same issue.

Is there any way that somehow benchmarking the box could somehow have damaged a PCI slot on the motherboard?

Or have I missed something obvious in the troubleshooting steps above?

I suppose I can get a OPM to test if there's light output from that port, right?

Any other suggestions?



No comments:

Post a Comment