Replacement of a failed drive in a zpool
July 30, 2025
Introduction
ZFS (Zettabyte File System) is a robust, enterprise-grade file system that combines the features of a file system and a volume manager. One of its most powerful attributes is its ability to manage and maintain large storage pools with built-in support for RAID-like configurations, such as RAID 0, RAID 1, RAIDZ, and RAID 10. Among these, RAID 10 is a hybrid configuration combining striping (RAID 0) and mirroring (RAID 1), offering both performance and redundancy. However, like any storage setup, drives can and do fail. Replacing a failed drive in a RAID 10 zpool is a critical process that must be performed with precision to maintain data integrity and system uptime. Unlike traditional hardware RAID, ZFS manages redundancy through its own software-based system. A typical RAID 10 zpool in ZFS is built by creating multiple mirrored vdevs (virtual devices) that are then striped together. This setup allows for high read/write performance and redundancy, as each mirror can tolerate the failure of one disk without data loss.
The rXg leverages the robust ZFS file system, specifically through zpools, to ensure high availability (HA) and data redundancy. This configuration is critical for maintaining continuous operation and protecting valuable data from drive failures, a common occurrence in any storage environment. By using ZFS mirrors within a zpool, the rXg platform can withstand the loss of individual drives without impacting data accessibility or integrity, allowing for hot-swapping and resilvering of new drives to restore full redundancy.
Zpool status with a failed drive
Detecting a failed drive is the first step in the replacement process. Symptoms may include:
- Warnings from monitoring tools or ZFS itself.
- Slow performance or intermittent read/write errors.
- Messages in system logs about I/O errors or device timeouts.
- The zpool status showing a degraded or faulted state.
You can check the health of your zpool by running:
zpool status
The example below shows a zpool with one failed drive (da3p4), which reduces the RAID 10 resilience to drive errors. Note that due to the use of RAID 10 configuration in the system, the rXg operating system and all applications remains fully accessible.
[root@rxg /space/rxg]# zpool status pool: zroot state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-2Q config: NAME STATE READ WRITE CKSUM zroot DEGRADED 0 0 0 mirror-0 ONLINE 0 0 0 da0p4 ONLINE 0 0 0 da1p4 ONLINE 0 0 0 mirror-1 DEGRADED 0 0 0 da2p4 ONLINE 0 0 0 da3p4 UNAVAIL 0 0 0 cannot open errors: No known data errors
Replacing failed drive
Once you have identified the failed drive (in this case, da3p4, located in slot 4 of the host platform):
- power down the server if hot-swapping is not supported.
- replace the failed drive with a new one, ideally with the same size and performance rating
- power the system back on and allow the new drive to be fully discovered.
It is advisable to take stock of the drive serial numbers before the swap is performed, to double check the swap operation, using for example the combination of following shell commands:
[root@rxg /space/rxg]# camcontrol devlist <AHCI SGPIO Enclosure 2.00 0001> at scbus4 target 0 lun 0 (ses0,pass0) <AHCI SGPIO Enclosure 2.00 0001> at scbus9 target 0 lun 0 (ses1,pass1) <SAMSUNG MZILG800HCHQAD3 DWG9> at scbus10 target 0 lun 0 (pass2,da0) <SAMSUNG MZILG800HCHQAD3 DWG9> at scbus10 target 1 lun 0 (pass3,da1) <SAMSUNG MZILG800HCHQAD3 DWG9> at scbus10 target 2 lun 0 (pass4,da2) <SAMSUNG MZILG800HCHQAD3 DWG9> at scbus10 target 3 lun 0 (pass5,da3) <DP BP_PSV 7.10> at scbus10 target 4 lun 0 (ses2,pass6) <PNY USB 3.2.1 FD PMAP> at scbus11 target 0 lun 0 (da4,pass7) [root@rxg /space/rxg]# diskinfo -s /dev/da0 S6LANA0XC02694 [root@rxg /space/rxg]# diskinfo -s /dev/da1 S6LANA0XC02693 [root@rxg /space/rxg]# diskinfo -s /dev/da2 S6LANA0XC02692 [root@rxg /space/rxg]# diskinfo -s /dev/da3 S6LANA0XC02695
If your system supports hot-swapping, you can remove and replace the drive without shutting down. After replacing the physical drive, the new disk will be detected by the operating system.
The listing below shows the list of available drives and their partitions after the physical replacement was made. The new drive is identified as /dev/da3, as can be confirmed by double checking its serial number
[root@rxg /space/rxg]# ls -lah /dev/da* crw-r----- 1 root operator 0xfb Jul 21 13:05 /dev/da0 crw-r----- 1 root operator 0x103 Jul 21 13:05 /dev/da0p1 crw-r----- 1 root operator 0x104 Jul 21 13:05 /dev/da0p2 crw-r----- 1 root operator 0x105 Jul 21 13:05 /dev/da0p3 crw-r----- 1 root operator 0x106 Jul 21 13:05 /dev/da0p4 crw-r----- 1 root operator 0xfa Jul 21 13:05 /dev/da1 crw-r----- 1 root operator 0xfe Jul 21 13:05 /dev/da1p1 crw-r----- 1 root operator 0xff Jul 21 13:05 /dev/da1p2 crw-r----- 1 root operator 0x100 Jul 21 13:05 /dev/da1p3 crw-r----- 1 root operator 0x102 Jul 21 13:05 /dev/da1p4 crw-r----- 1 root operator 0xfc Jul 21 13:05 /dev/da2 crw-r----- 1 root operator 0x107 Jul 21 13:05 /dev/da2p1 crw-r----- 1 root operator 0x109 Jul 21 13:05 /dev/da2p2 crw-r----- 1 root operator 0x10b Jul 21 13:05 /dev/da2p3 crw-r----- 1 root operator 0x10d Jul 21 13:05 /dev/da2p4 crw-r----- 1 root operator 0xf8 Jul 21 13:05 /dev/da3 [root@rxg /space/rxg]# diskinfo -s /dev/da3 8EX0A0BX0KX3
The replacement command executed in the shell as root is
zpool replace zroot da3p4 /dev/da3
ZFS will then begin a process known as "resilvering," where it rebuilds the data on the new disk using its mirror.
The zpool status immediately afterwards shows the replacement of the failed (replaced) drive.
[root@rxg /space/rxg]# zpool status pool: zroot state: DEGRADED scan: resilvered 6.42G in 00:00:08 with 0 errors on Mon Jul 21 13:15:47 2025 config: NAME STATE READ WRITE CKSUM zroot DEGRADED 0 0 0 mirror-0 ONLINE 0 0 0 da0p4 ONLINE 0 0 0 da1p4 ONLINE 0 0 0 mirror-1 DEGRADED 0 0 0 da2p4 ONLINE 0 0 0 replacing-1 DEGRADED 0 0 0 da3p4 UNAVAIL 0 0 0 cannot open da3 ONLINE 0 0 0 errors: No known data errors
Pool status after the replacement was completed
[root@rxg /space/rxg]# zpool status pool: zroot state: ONLINE scan: resilvered 6.42G in 00:00:08 with 0 errors on Mon Jul 21 13:15:47 2025 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da0p4 ONLINE 0 0 0 da1p4 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 da2p4 ONLINE 0 0 0 da3 ONLINE 0 0 0 errors: No known data errors
Note that the drive was adopted into the pool and properly resilvered. Given the size and the speed of the drive, the resilvering process was likely too quick to be identified explicitly, but with slower drives, the following output can be observed in the zpool status:
scan: resilver in progress since ... 5.21G scanned out of 100G at 100M/s, 10m to go
Sometimes, ZFS may refuse to use a disk that contains old partition tables or metadata. You can wipe the new disk with:
wipefs -a /dev/sdX
Replace /dev/sdX with the device identifier of the new disk.
Best Practices for ZFS Drive Pools
To ensure long-term reliability and performance of your zpool, consider the following best practices:
- Use matched drives: Always use drives of the same size and performance characteristics, at best, using the recommended replacement models.
- Monitor regularly: Use monitoring tools like zabbix, prometheus, or even cron jobs with zpool status to catch failures early.
- Test drive replacements: In a lab environment, practice the drive replacement process so you're prepared when it happens in production.
- Maintain spares: Keep spare drives on hand for quick replacement, especially if the host is mission critical and must be back up from a drive failure as soon as possible
- Enable alerts: Configure email or logging alerts to notify you of any zpool degradation or errors.
- Use persistent device names: Instead of /dev/sdX, use /dev/disk/by-id/ or ZFS's GUID-based device identifiers to avoid confusion due to changing device names. This is especially visible in the example used above, where both the failed and the replacement drives were allocated the same drive name under the device directory.