Just received an email from a peer in the industry (
The Issue
When upgrading the UCS firmware of your environment, some B200M4 blades may reboot spontaneously.
Cisco Bug: CSCut61527 – B200M4 could reboot on shallow_chkpt when BMC returns invalid FRU data. This is the Cisco article.
Affected firmware: 2.2(3) and 2.2(3a)A according to the KB. My upgrade was from 2.2(3c) to 2.2(3f).
Resolution
According to the KB this issue should be resolved in 2.2(3g)A.
haha, I am extremely familiar with this bug. Silde some M4’s in and watch them mysteriously reboot many weeks/months later. Fail over the mgmt process the FI = more reboots.
The fix is to upgrade to the firmware you mention. Guess what, more reboots while you perform the upgrade to the new code.
The problem is with the MegaRaid controller and how the UCS firmware interprets certain FRU information of that controller.
Pretty rotten bug if you ask me, took Cisco like 8-12 weeks to fix it, I’ve been having multiple phone calls a week with them on this since I reported the bug.
Ben