Replacing failed disk in software RAID


mark the drive as failed first

mdadm /dev/mdX -f /dev/sdYX

Power down and physically replace faulty drive. Make sure the partitions are properly aligned, as in http://www.ibm.com/developerworks/linux/library/l-4kb-sector-disks/#tools

Boot and partition new drive, and if you want to partition the new drive similar as others, you can do it with..

sfdisk -d /dev/sdX | sfdisk /dev/sdY
mdadm --detail /dev/mdX
mdadm /dev/mdX -a /dev/sdYX
watch -n .1 cat /proc/mdstat

In case failed disk has not been removed from the raid, we might need to assemble raid manually.
First, examine current raid configuration, and if needed update system mdadm.conf.

mdadm --examine --scan
nano -w /etc/mdadm.conf

Assemble raid:

mdadm --assemble /dev/mdX
mdadm --assemble /dev/mdX --scan
mdadm --assemble /dev/mdX --scan --force

In case if the assembling fails, try to stop raid and assemble it after

mdadm --stop /dev/mdX
mdadm --assemble /dev/mdX --scan --force

finally, add a new drive

mdadm --manage /dev/mdX --add /dev/sdYX

RAID repair
echo repair >> /sys/block/md1/md/sync_action
echo repair >> /sys/block/md2/md/sync_action
echo repair >> /sys/block/md3/md/sync_action

RAID maintenance and check
echo check >> /sys/block/md1/md/sync_action
echo check >> /sys/block/md2/md/sync_action
echo check >> /sys/block/md3/md/sync_action