I have another article discussed about conventional RAID, Compared to conventional RAID, some moden cluster file systems like GPFS™ , Ceph, gluster etc.. they use declusterd array.

Their RAID implement a sophisticated data and spare space disk layout scheme that allows for arbitrarily sized disk arrays while also reducing the overhead to clients when recovering from disk failures. To accomplish this, the cluster RAID uniformly spreads or declusters user data, redundancy information, and spare space across all the disks of a declustered array.

This article is based on IBM GPFS decluster knowledge document, I made is more general.


What it look like?

Figure 1 below compares a conventional RAID layout versus an equivalent declustered array. Both using 7 disks.

The left side shows 3 conventional RAID1(mirror) arrays on 6 disks, one spare for the storage system. A typical 1-fault-tolerant 1 + 1 replicated RAID array.

The right side shows a declustered array on 7 disks, suppose 7 tracks per disk, so that's total 49 strips stripted across 7 disks. Compare to conventional array, eacy array has 7 mirrored strips, total 21 strips.

To decluster the array, the strips of each disk position for every track are then arbitrarily allocated onto the disks of the declustered array of the lower right (in this case, by vertically sliding down and compacting the strips from above). The spare strips are uniformly inserted, one per disk.

Data is no longer alligen together compare to convention array, each strps is a single disk block, in this mirrored array case, its mirrored strip could be on any other disks in the cluster array.


Conventional RAID versus declustered RAID layouts.

If you are not sure how conventioal RAID works, see understanding RAID levels,

How it works?

Fundenmentally, the declustered array still work same way as conventional array, supports different RAID levels dependes on particular cluster filesystem. but array is no longer stick to a fixed disk peers. Strips on each array are spread across 7 disks.

So, as illustrated in Figure 2, a declustered array can significantly shorten the time required to recover from a disk failure, which lowers the rebuild overhead for client applications. When a disk fails, erased data is rebuilt using all the operational disks in the declustered array, the bandwidth of which is greater than that of the fewer disks of a conventional RAID group. Furthermore, if an additional disk fault occurs during a rebuild, the number of impacted tracks requiring repair is markedly less than the previous failure and less than the constant rebuild overhead of a conventional array.

The decrease in declustered rebuild impact and client overhead can be a factor of three to four times less than a conventional RAID. Considering GPFS, Ceph, gluster stripe data across all the storage nodes of a cluster, file system performance becomes less dependent upon the speed of any single rebuilding storage array.

For IO pespective, it boost i/o performance too, for large sequence read/write, more disk spindles are involved in declustered array.

How rebuilding works in details

Figure 2. Lower rebuild overhead in conventional RAID versus declustered RAID.
When a single disk fails in the 1-fault-tolerant 1 + 1 conventional array on the left, the redundant disk is read and copied onto the spare disk, which requires a throughput of 7 strip I/O operations.
When a disk fails in the declustered array, all replica strips of the six impacted tracks are read from the surviving six disks and then written to six spare strips, for a throughput of 2 strip I/O operations. The bar chart illustrates disk read and write I/O throughput during the rebuild operations.
Lower rebuild overhead in conventional RAID versus declustered RAID.

The benefit of using declusted array

1. Shorten the rebuilding time, less data loss risk

8TB disk is on the market, it takes about 30 hours to finish a whole disk write. Using declustered array, more disk spindles involved writing when rebuilding, so largely increase rebuilding time,

2. IO Performance improvement

Disk strips are spread on more disk spindles, thus, boost i/o when it's needed.

3. More redundance protection in large cluster environment

Declustered array can go across storage nodes, so data is still safe when certain number of storage nodes go down. It's not possible on conventional array.