Qumulo’s Distributed File System
Scalable Block Store (SBS)
Contents of this guide
The Qumulo Scalable Block Store (SBS)
The Qumulo file system is built on top of a powerful, state-of-the art data management system called the Qumulo Scalable Block Store (SBS). SBS uses the principles of massively scalable distributed databases, and is optimized for the specialized needs of file-based data.
The Scalable Block Store is the block layer of the Qumulo file system, making that file system simpler to implement and extremely robust. SBS also gives the file system massive scalability, optimized performance, and data protection.
Here is an overview of what’s inside SBS:
SBS provides a transactional virtual layer of protected storage blocks. Instead of a system where every file must figure out its protection for itself, data protection exists beneath the file system, at the block level.
Qumulo’s block-based protection, as implemented by SBS, provides outstanding performance in environments that have petabytes of data and workloads with mixed file sizes. SBS has many benefits, including:
- Fast rebuild times in case of a failed disk drive
- The ability to continue normal file operations during rebuild operations
- No performance degradation due to contention between normal file writes and rebuild writes
- Equal storage efficiency for small files and large files
- Accurate reporting of usable space
- Efficient transactions that allow Qumulo clusters to scale to many hundreds of nodes
- Built-in tiering of hot/cold data that gives flash performance at archive prices.
To understand how SBS achieves these benefits, we need to look at how it works.
Protected virtual blocks
The entire storage capacity of a Qumulo cluster is organized conceptually into a single, protected virtual address space, shown below:
Each protected address within that space stores a 4K block of bytes. By “protected” we mean that all blocks are recoverable even if multiple disks were to fail. The entire file system is stored within the protected virtual address space provided by SBS, including the directory structure, user data, file metadata, analytics, and configuration information.
In other words, the protected store acts as an interface between the file system and block-based data recorded on attached block devices. These devices might be virtual disks formed by combining SSDs and HDDs, or block-storage resources in the cloud.
Note that the blocks in the protected address space are distributed across all of the nodes (or instances) of the Qumulo cluster. However, the Qumulo file system sees only a linear array of fully-protected blocks.
In scalable block storage, reads and writes to the protected virtual address space are transactional.
This means that, for example, when a file system operation requires a write operation that involves more than one block, the operation will either write all the relevant blocks, or none of them.
Atomic read and write operations are essential for data consistency and the correct implementation of file protocols such as SMB and NFS. For optimum performance, SBS uses techniques that maximize parallelism and distributed computing while also maintaining transactional consistency of I/O operations. For example, SBS is designed to avoid serial bottlenecks, where operations would proceed in a sequence rather than in parallel. SBS’s transaction system uses principles from the ARIES algorithm for non-blocking transactions, including write-ahead logging, repeating history during undo actions, and logging undo actions.
However, SBS’s implementation of transactions has several important differences from ARIES. SBS takes advantage of the fact that transactions initiated by the Qumulo file system are predictably short, in contrast to general-purpose databases where transactions may be long-lived. A usage pattern with short-lived transactions allows SBS to frequently trim the transaction log for efficiency. Short-lived transactions allow faster commitment ordering.
Also, SBS’s transactions are highly distributed, and do not require globally defined, total ordering of ARIES-style sequence numbers for each transaction log entry. Instead, transaction logs are locally sequentially in each of the bstores and coordinated at the global level using a partial ordering scheme that takes commitment ordering constraints into account.
Qumulo DB uses a two-phase locking (2PL) protocol to implement serializability for consistent commitment ordering. Serializable operations are performed by distributed processing units (bstores) and have the property that the intended sequence of the operations can be reconstructed after the fact.
The advantage of SBS’s approach is that the absolute minimum amount of locking is used for transactional I/O operations, and this allows Qumulo clusters to scale to many hundreds of nodes.