I have a 2U4N SuperMicro server that I'm using to setup a Ceph cluster. I have a single Intel X710-DA2 (Dell-branded) installed in each of the 4 nodes.
I ran Geekbench 5 on one node, in order to put the CPU through its paces.
However, partway through, the machine appeared to abruptly shutdown.
Afterwards, when I tried to boot it up via the SuperMicro IPMI, it said:
> Performing power action failed. Please check.
https://i.imgur.com/If8zRgr.png
Has anybody seen this error message on a SuperMicro system before?
Anybody, I unplugged that node, and waited a bit, plugged it back in.
I tried to boot it up again, and this time, it said:
> Intel(R) 40GbE 1.7.19 is Unhealthy
https://i.imgur.com/FeV3OGk.png
Then, when I went into BIOS setup, it said:
> The driver failed to load because an unsupported module type was detected. Message code: 10696053409972224
https://i.imgur.com/v7jzK3l.png
I then tried to boot it up again and it did boot - but I seem to have lost network connectivity from one of the two SFP+ ports.
I tried swapping the SFP+ optic in that port, in case it was that - however, still no connectivity.
I then even tried swapping to another new X710-DA2 - same issue.
Is there any way that somehow benchmarking the box could somehow have damaged a PCI slot on the motherboard?
Or have I missed something obvious in the troubleshooting steps above?
I suppose I can get a OPM to test if there's light output from that port, right?
Any other suggestions?
No comments:
Post a Comment