Over the lifetime of a system HDD can fail, both PSU and HDD are the most common hardware failure parts. To replace a HDD is straight forward, follow the steps as below
1) Add in new HDD mount point
use the command "devblk_info.lua" as follows to find the un-used HDD serial number
Reboot the system so the new HDD drive mappings gets updated. The files in /opt/fmadio/disk/hdd* should have links for hdd0, hdd1, hdd2, hdd3.
NOTE: the actual /dev/sd* value changes psuedo randomly and not important. Its why we map using the disks serial number
fmadio@fmadio20-049:~$ ls -al /opt/fmadio/disk/hdd*
lrwxrwxrwx 1 root root 8 May 25 15:10 /opt/fmadio/disk/hdd0 -> /dev/sdd
lrwxrwxrwx 1 root root 8 May 25 15:10 /opt/fmadio/disk/hdd1 -> /dev/sde
lrwxrwxrwx 1 root root 8 May 25 15:10 /opt/fmadio/disk/hdd2 -> /dev/sdc
lrwxrwxrwx 1 root root 8 May 25 15:10 /opt/fmadio/disk/hdd3 -> /dev/sdb
fmadio@fmadio20-049:~$
3) Start the RAID array
Check if /dev/md0 is has been started, The following has /dev/md0 started, note the md0 partitions
fmadio@fmadio20-049:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 223.6G 0 disk
|-sda1 8:1 0 14.9G 0 part /mnt/sda1
`-sda2 8:2 0 208.7G 0 part /mnt/store0
sdb 8:16 0 3.7T 0 disk
`-md0 9:0 0 10.9T 0 raid5
|-md0p1 259:0 0 59.6G 0 md
`-md0p2 259:1 0 10.9T 0 md
sdc 8:32 0 3.7T 0 disk
`-md0 9:0 0 10.9T 0 raid5
|-md0p1 259:0 0 59.6G 0 md
`-md0p2 259:1 0 10.9T 0 md
sdd 8:48 0 3.7T 0 disk
`-md0 9:0 0 10.9T 0 raid5
|-md0p1 259:0 0 59.6G 0 md
`-md0p2 259:1 0 10.9T 0 md
sde 8:64 0 3.7T 0 disk
sdf 8:80 1 223.6G 0 disk
sdg 8:96 1 223.6G 0 disk
sdh 8:112 1 223.6G 0 disk
sdi 8:128 1 223.6G 0 disk
sdj 8:144 1 223.6G 0 disk
sdk 8:160 1 223.6G 0 disk
sdl 8:176 1 223.6G 0 disk
sdm 8:192 1 223.6G 0 disk
The following shows if /dev/md0 has not been started, notice the md0 partitions are missing
fmadio@fmadio20-049:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 223.6G 0 disk
|-sda1 8:1 0 14.9G 0 part /mnt/sda1
`-sda2 8:2 0 208.7G 0 part /mnt/store0
sdb 8:16 0 3.7T 0 disk
sdc 8:32 0 3.7T 0 disk
sdd 8:48 0 3.7T 0 disk
sde 8:64 0 3.7T 0 disk
sdf 8:80 1 223.6G 0 disk
sdg 8:96 1 223.6G 0 disk
sdh 8:112 1 223.6G 0 disk
sdi 8:128 1 223.6G 0 disk
sdj 8:144 1 223.6G 0 disk
sdk 8:160 1 223.6G 0 disk
sdl 8:176 1 223.6G 0 disk
sdm 8:192 1 223.6G 0 disk
fmadio@fmadio20-049:~$
To force assembly of the array as follows
$ sudo mdadm --assemble /dev/md0 --force /opt/fmadio/disk/hdd0 /opt/fmadio/disk/hdd1 /opt/fmadio/disk/hdd2 /opt/fmadio/disk/hdd3
mdadm: no RAID superblock on /opt/fmadio/disk/hdd1
mdadm: /opt/fmadio/disk/hdd1 has no superblock - assembly aborted
fmadio@fmadio20-049:~$
In the above its showing HDD1 is missing (e.g. the failed drive). Please remove it (/opt/fmadio/disk/hdd1) from the array and run the assembly command again as follows
fmadio@fmadio20-049:~$ sudo mdadm --assemble /dev/md0 --force /opt/fmadio/disk/hdd0 /opt/fmadio/disk/hdd2 /opt/fmadio/disk/hdd3
mdadm: /dev/md0 has been started with 3 drives (out of 4).
fmadio@fmadio20-049:~$
Then get details of the array status with (mdadm --detail /dev/md0)
fmadio@fmadio20-049:~$ sudo mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Tue Nov 5 23:31:47 2019
Raid Level : raid5
Array Size : 11720662464 (11177.70 GiB 12001.96 GB)
Used Dev Size : 3906887488 (3725.90 GiB 4000.65 GB)
Raid Devices : 4
Total Devices : 3
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Tue May 25 15:07:01 2021
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Name : fmadio20-049:0 (local to host fmadio20-049)
UUID : 5f5b2011:dde52c71:88d6fcd4:0930bb91
Events : 2
Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 8 32 1 active sync /dev/sdc
2 8 48 2 active sync /dev/sdd
6 0 0 6 removed
fmadio@fmadio20-049:~$
Notice in this case the "State" is "clean, degraded" and one drive is removed
4) Add the new drive to the array
Next add the new drive to the array using the (mdadm --add /dev/md0 ) command passing in the mount point of the new drive. In this case we are adding /opt/fmadio/hdd1
fmadio@fmadio20-049:~$ sudo mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Tue Nov 5 23:31:47 2019
Raid Level : raid5
Array Size : 11720662464 (11177.70 GiB 12001.96 GB)
Used Dev Size : 3906887488 (3725.90 GiB 4000.65 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Wed May 26 07:14:31 2021
State : clean, degraded, recovering
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 64K
Rebuild Status : 0% complete
Name : fmadio20-049:0 (local to host fmadio20-049)
UUID : 5f5b2011:dde52c71:88d6fcd4:0930bb91
Events : 9
Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 8 32 1 active sync /dev/sdc
2 8 48 2 active sync /dev/sdd
4 8 64 3 spare rebuilding /dev/sde
fmadio@fmadio20-049:~$
Can see the new state says its "clean, degraded, recovering" and individual disk state is "spare rebuilding"
During the rebuild, the write back performance from SSD to HDD will be reduced, however capture rate to the SSDs (1TB-4TB) is not impacted at all. e.g. full 20Gbps line rate is no problem.
Typically it takes 8H-24H to completely rebuild an array depending on the total storage size, monitoring the completion status is shown here, in the output of mdadm --detail /dev/md0
Rebuild Status : 7% complete
5) Final reboot
After RAID5 rebuild has completed, please do a final reboot.