Replacing a failed disk in the ZFS root

Introductory Concepts

This document serves as a guide for administrators and individuals who possess a certain level of knowledge about computing hardware platforms and storage concepts, such as RAID. If you already possess a good understanding of the general failure process, you may skip ahead to the sections on replacing a drive and repairing the pool.

Degrees of verbosity

Whenever a drive fails or has errors, SmartOS logs a great deal of information. Drilling down in more detail is necessary to find out what is causing disk failure. Below are the commands in descending order of verbosity:

The 'zpool status' command provides an overview of the pool's health.

iostat provides us with high level error counts as well as specific information about the devices.

fmadm faulty will help us determine what caused the disk failure more precisely. fmadm is also capable of clearing transitory faults;

The fmdump command provides us with a log of the last {n} days of fault events. 

In addition to replacing the faulted disks, this information can be extremely helpful in isolating the root cause of the problem if the issue is more complex than a simple disk failure.

General failure process

In the system, when a disk failure occurs, ZFS is not the first component to detect and respond to it. Instead, the following general order of events takes place:

1. FMA (Fault Management Architecture) detects and logs a failed disk. FMA monitors and manages hardware faults in the system.

2. Once FMA detects the failed disk, the operating system steps in and removes the disk from the system.

3. ZFS, which is a robust and scalable file system, then becomes aware of the changed state. ZFS detects that the disk has been removed and responds by faulting the device. Faulting the device means that ZFS marks the disk as faulty and takes the necessary measures to maintain data integrity and availability.

This general failure process ensures that any disk failures in the system are promptly detected, logged, and responded to by the relevant components. FMA takes the initial step of identifying the failed disk, followed by the operating system removing it, and finally, ZFS reacting to the changed state by faulting the device. By following this order of events, the system can effectively handle disk failures and maintain the overall stability and reliability of the storage infrastructure.

Please note that this is a general description of the failure process and may vary depending on the specific implementation and configuration of the system.

How to replace a drive

High-level overview of a failed disk replacement in the ZFS root

Note: Before proceeding with the disk replacement process, ensure that you correctly identified the failed disk. Also, make sure you have a spare disk available for replacement.

Step-by-Step Guide to replacing a failed disk in the ZFS root

Suppose the server has 2 disks: /dev/sda and /dev/sdb
One of the disks failed, for example /dev/sdb
The damaged disk has to be replaced.

First, let's define our pool with the command:

zpool list
Replacing a failed disk in the ZFS root

Before replacing the disk, it is advisable to remove the disk from the array:

zpool detach rpool /dev/sdb

Next, turn off the server if you can not hot swap disks, but everything described below can be done without stopping the server - if you have a controller and hotswap

poweroff

Physically replace the failed disk.

Determining the partition table (GPT or MBR) and transferring it to the new disk

After replacing a damaged disk, you need to determine whether the partition table is GPT or MBR.
To do this use gdisk. Install gdisk:

apt-get install gdisk -y

Run the command:

gdisk -l /dev/sda

Where /dev/sda is a valid disk in RAID.

For MBR the output will be approximately the following:

Partition table scan:
MBR: MBR only
BSD: not present
APM: not present
GPT: not present

For the GPT, it is roughly as follows:

Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present

Before adding a disk to the array, it needs to be partitioned exactly like a good sda disk. This is done differently depending on the partitioning of the disk.

Copying Partitioning for GPT 

To copy GPT partitioning: 

Note! The first is the disk to which the partitioning is copied, and the second is the disk from which the partitioning is copied. If you mix them up, the partitioning on the originally serviceable disk will be destroyed.

sgdisk -R /dev/sdb /dev/sda

Assign a new random UUID to the disk:

sgdisk -G /dev/sdb

Copying partitioning for MBR

To copy MBR partitioning:

Note!
Here you write first the disk from which you are transferring the partitioning and second the disk to which you are transferring it.

sfdisk -d /dev/sda | sfdisk /dev/sdb

If the partitions are not visible in the system, you can reread the partition table with the command:

sfdisk -R /dev/sdb

Installing the bootloader

After partitioning the disk, you need to install the bootloader on it:

grub-install /dev/sdb

Adding a disk 

Specify which disk we are going to replace (in our case it is /dev/sdb3):

zpool replace rpool /dev/sdb3

Adding a disk to the raid:

zpool online rpool /dev/sdb3

Wait for the array synchronization to finish:

zpool status

Conclusion

By following these instructions, you can successfully replace a failed disk in the ZFS root without compromising the integrity of your data. It is essential to take precautions and perform the backup and verification steps to avoid potential data loss and ensure smooth operations.