Tuesday, January 25, 2011

Disk Failure. Or Not.

I switched on my workstation this morning and was greeted with a ton of errors from one of the disks. Thank goodness for RAID! The obvious, initial conclusion was that the drive had failed. However, further examination showed that one of the three RAID1 devices shared across the drives was still working. While this condition is possible I suppose (I'm not exactly sure how), it cast doubt on the obvious conclusion.

I spent the next two hours looking for other possible hardware problems. The only other suspicious possibility was that the SATA adapter was pulled out about 1-2 mm of the slot on the unsecured end of the card. Again it's not enough to be conclusive. I ran the SMART tests on the disk and they passed. Unfortunately only failing these tests is considered conclusive. The RAID devices rebuilt successfully. This is a lengthy process and accesses the entire drive, so it should have triggered any hard failures.

In the end, the safest thing to do is buy a spare drive. Just in case.

