EMC Clarrion faulting devices

By YendisOn March 19, 2013

Whilst doing a physical inspection of our datacentre today I noticed that we had multiple disk enclosures indicating an amber warning light. After checking Navisphere (and after picking myself up of the floor) I noticed the following errors. As you can see from the diagram six disk enclosures had faulted.

I then expanded one of the disk enclosures and as you can see, the majority of disks attached to that enclosure again indicated that they had faulted.

Also checking the alerts in Navisphere they were 75 errors indicating disk failures

Even though we had all of these errors in Navisphere, the data volumes/luns were still accessible in vSphere and all systems continued to run without any issues or performance problems.

After running SP Collects (used for collecting logs and troubleshooting) and talking with Dell/ EMC it was concluded that the issue was with the backend B on Bus 1. This resulted in having to reseat all HSSDC cables and LCCs (link control cards) on the backend B on Bus 1 starting from the bottom upwards i.e. Bus 1 Enclosure 0, then Bus 1 Enclosure 1 and so on…

NOTE: – These HSSDC cables and LCC cards are hot swappable and will not cause an outage on your storage. Depending on your environment when reseating the components you may experience some performance issues or I/O bottleneck during the procedure.

After reseating the cards and cables and giving the CX3-40 time to update/refresh the error corrected itself, this was due to a hung LCC card in one of the DAE3P’s

All looks healthy now 🙂