Storage Management

Linux Software RAID (md)

Linux Software RAID is part of the Linux kernel. The corresponding devices are implemented through the md (Multiple Devices) device driver. Setup and management of RAID devices is done with the mdadm command (see its man page for more details). Status information is obtained from /proc/mdstat.

How to replace a failed disk in a Software RAID Setup

In case of a disk failure, use mdadm to remove the failed disk from the raid-array, and after replacing the disk, first partition it as the old one, then again use mdadm to include the new disk into the raid-array. A failed disk is marked with (F) in /proc/mdstat.

Example:

0 root@cl-head ~ #
cat /proc/mdstat

Personalities : [raid0] [raid1] [raid5] [multipath]
read_ahead 1024 sectors
md0 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]
md1 : active raid1 sdb2[0](F) sda2[1]
17414336 blocks [2/1] [_U]
md2 : active raid1 sdb3[1] sda3[0]
18322048 blocks [2/2] [UU]
unused devices: <none>

In this example, disk /dev/sdb has failed. While the disk error only affected /dev/md1, other partitions of the faulty disk are also part of /dev/md0 and /dev/md2. So they need to be removed as well before the disk can be replaced. Hence, the following commands need to be executed:

To remove the faulty partition:

0 root@cl-head ~ #
mdadm -r /dev/md1 /dev/sdb2

To mark the other affected partitions on the disk as faulty and remove them:

0 root@cl-head ~ #
mdadm -f /dev/md0 /dev/sdb1
0 root@cl-head ~ #
mdadm -r /dev/md0 /dev/sdb1
0 root@cl-head ~ #
mdadm -f /dev/md2 /dev/sdb3
0 root@cl-head ~ #
mdadm -r /dev/md2 /dev/sdb3

Now the disk is not accessed any more and can be removed. After the new disk has been inserted, partition it like the other disk(s) using parted and then start the resync:

0 root@cl-head ~ #
mdadm -a /dev/md0 /dev/sdb1
0 root@cl-head ~ #
mdadm -a /dev/md1 /dev/sdb2
0 root@cl-head ~ #
mdadm -a /dev/md2 /dev/sdb3

To watch the resync process, you can enter:

0 root@cl-head ~ #
watch --differences=cumulative cat /proc/mdstat

Press Ctrl-c to exit the display.

After an OS RAID boot disk has been replaced, one also has to make sure that the BIOS will be able to boot from the new disk. For systems installed/booting in (U)EFI mode, you have to execute the shell function grub-install-uefi-raid for that. On systems running under legacy BIOS mode, you will have to run the command dpkg-reconfigure grub-pc and make sure the new disk is selected when prompted for the GRUB install locations (leave all other options as is).

Logical Volume Management

The Linux Logical Volume Manager (LVM) provides a convenient and flexible way of managing storage. Storage devices like hard disks or RAID sets are registered as physical volumes, and are then assigned to volume groups. Volume groups contain one or more logical volume, which can be resized according to the storage space available in the volume group. New physical volumes can be added to or removed from a volume group at any time, thereby transparently enlarging or reducing the storage space available in a volume group. Filesystems are created on top of logical volumes.

Examples:

0 root@cl-head ~ #
pvcreate /dev/sdb1
0 root@cl-head ~ #
vgcreate vg0 /dev/sdb1
0 root@cl-head ~ #
lvcreate -n scratch -L 1GB vg0

These commands declare /dev/sdb1 as a physical volume, create the volume group vg0 with the physical volume /dev/sdb1, and create a logical volume /dev/vg0/scratch of size 1GB. You can now create a filesystem on this logical volume and mount it:

0 root@cl-head ~ #
mkfs.ext4 /dev/vg0/scratch
0 root@cl-head ~ #
mount /dev/vg0/scratch /scratch

To increase the size of the filesystem, you do not have to unmount it. Just increase the logical volume before resizing it:

0 root@cl-head ~ #
lvextend -L +1G /dev/vg0/scratch
0 root@cl-head ~ #
resize2fs /dev/vg0/scratch

This increased the filesystem by 1 GB. If you want to decrease the size of the filesystem, you first need to unmount it. After that, decrease the filesystem and finally reduce the logical volume:

0 root@cl-head ~ #
unmount /scratch
0 root@cl-head ~ #
e2fsck -f /dev/vg0/scratch
0 root@cl-head ~ #
resize2fs /dev/vg0/scratch 500M
0 root@cl-head ~ #
lvreduce -L 500M /dev/vg0/scratch
0 root@cl-head ~ #
mount /dev/vg0/scratch /scratch

This decreased the filesystem to 500Mb. To check how much space is left in a volume group, use the command vgdisplay. Look for a line showing Free Size.

The most common LVM commands are:

  • Physical volumes: pvcreate

  • Volume groups: vgscan, vgchange, vgdisplay, vgcreate, vgremove

  • Logical volumes: lvdisplay,lvcreate, lvextend, lvreduce, lvremove