Fixing a Raid 5 Degraded Array After a Failed Disk

Raid 5

Failed hard drive on a Dell PowerEdge server placed Raid 5 on a PERC 4/di in a degraded state. Here are the steps to rebuild the Raid array on a Dell PowerEdge.

We had a harddrive crash on one of our Dell PowerEdge servers this week. Fortunately, we had no downtime because it only took one drive out of our Raid 5 configuration. Once we installed the new drive everything rebuilt, but the Array Disks still showed in a degraded state. After trying a lot of options like reseating the hot swap drive, rebuilding the raid array again, running a lot of diagnostics on the drives, etc., we were still getting a predictive drive failure with an error code of 2094 in Dell's OpenManage log alerts.

We decided to try another drive. It worked. The first "new" drive we put in was failing the S.M.A.R.T. test and showing a predictive failure. Replacing the "new" broken drive was the trick. So when it was all done these were the steps to get it out of degraded mode on the PERC 4/di controller array disk.

Remove bad disk (hot swap so no need to power down the machine)
Replace with a good disk of same size or larger
It automatically rebuilds the array
Once the rebuild goes to 100% it still showed in degraded state
Clear logs
Clear alert log
Do a global rescan
Everything shows a good status and running without problems.

I was very nervous about pulling drives on the fly, but everything went well.

We hadn't seen much of a drop in performance, although most articles I've read mentioned a huge hit in performance. We saw some performance drop in our website log statistics. Other than that everything went very smoothly for a hard drive failing.

The lesson here is always have backups and check your servers often for failing hardware. Preventative checks are much easier to handle compared to rebuilding a machine from the ground up.

Previous Blog
« PageRank is Dead. Long Live mozRank

Next Blog
2011 Website Browser Statistics »