Here’s how Qumulo’s file system works.
It will make you rethink what’s possible with your storage.
Go under the hood and see Qumulo software’s unique abilities
Qumulo is a new kind of storage company, based entirely on advanced software. Commodity hardware running advanced, distributed software is the unchallenged basis of modern low-cost, scalable computing. This is just as true for large-scale file storage as it is for search engines and social media platforms.
How can it be so fast?
When people see Qumulo’s file system in action for the first time, they often ask: “How can it do that so fast?” Get the answers here, in our technical guide.
Clusters that work together
With Qumulo’s file system, cloud instances or computing nodes with standard hardware work together to form a cluster that has scalable performance and a single, unified file system. Qumulo clusters work together to form a globally distributed but highly connected storage fabric tied together with continuous replication.
Qumulo’s software is unique in how it approaches the problem of scalability. Its design incorporates principles used by modern, large-scale, distributed databases. The result is a file system with unmatched scale characteristics.
Billions of files
For massively scalable files and directories, the Qumulo file system makes extensive use of index data structures known as B-trees. B-trees minimize the amount of I/O required for each operation as the amount of data increases. With B-trees as a foundation, the computational cost of reading or inserting data blocks grows very slowly as the amount of data increases.
A highly distributed scalable block store persists the B-trees across the Qumulo cluster.
Qumulo’s file system provides real-time visibility and control for file systems of all sizes, even with file counts numbering in the tens of billions. Up-to-the-minute analytics allow administrators to pinpoint problems and effectively control how storage is used. The answers to these queries arrive instantly. With Qumulo, storage administrators can see usage, activity and throughput at any level of the unified directory structure.
In the Qumulo file system, metadata such as bytes used and file counts are aggregated as files and directories are created or modified. This means that the information is available for timely processing without expensive file system tree walks.
Just as real-time aggregation of metadata enables Qumulo’s real-time analytics, it also enables real-time capacity quotas. Quotas allow administrators to specify how much capacity a given directory is allowed to use for files.
Unlike legacy systems, with Qumulo quotas are deployed immediately and do not have to be provisioned. They are enforced in real time, and changes to their capacities are immediately implemented. Quotas can be specified at any level of the directory tree.
Snapshots let system administrators capture the state of a file system or directory at a given point in time. If a file or directory is modified or deleted unintentionally, users or administrators can revert it to its saved state.
Snapshots in Qumulo’s file system have an extremely efficient and scalable implementation. A single Qumulo cluster can have a virtually unlimited number of concurrent snapshots without performance or capacity degradation.
Qumulo’s file system provides continuous replication across storage clusters, whether on premises or in the public cloud. Once a replication relationship between a source cluster and a target cluster has been established and synchronized, Qumulo automatically keeps data consistent. There’s no need to manage the complex job queues for replication associated with legacy storage appliances.
Continuous replication in Qumulo’s file system leverages advanced snapshot capabilities to ensure consistent data replicas. With Qumulo snapshots, a replica on the target cluster reproduces the state of the source directory at exact moments in time. Replication relationships can be established on a per-directory basis for maximum flexibility.
When people are introduced to Qumulo’s real-time analytics and watch them perform at scale, their first question is usually, “How can it be that fast?” The breakthrough performance of Qumulo’s analytics is possible because of a component called QumuloDB.
QumuloDB continually maintains up-to-date metadata summaries for each directory. It uses the file system’s B-trees to collect information about the file system as changes occur. Various metadata fields are summarized inside the file system to create a virtual index. The performance analytics that you see in the GUI and can pull out with the REST API are based on sampling mechanisms that are enabled by QumuloDB’s metadata aggregation. QumuloDB is built-in and fully integrated with the file system itself.
Scalable Block Store (SBS)
The Qumulo file system sits on top of a transactional virtual layer of protected storage blocks called the Scalable Block Store (SBS). Instead of a system where every file must figure out its protection for itself, data protection exists beneath the file system, at the block level. Qumulo’s block-based protection, as implemented by SBS, provides outstanding performance in environments that have petabytes of data and workloads with mixed file sizes.
SBS has many benefits, including:
- Fast rebuild times in case of a failed disk drive
- The ability to continue normal file operations during rebuild operations
- No performance degradation due to contention between normal file writes and rebuild writes
- Equal storage efficiency for small files as for large files
- Accurate reporting of usable space
- Efficient transactions that allow Qumulo clusters to scale to many hundreds of nodes
- Built-in tiering of hot/cold data that gives flash performance at archive prices
- Built-in support for all-flash configurations for workloads that require the highest performance
The virtualized protected block functionality of SBS is a huge advantage for the Qumulo file system. In legacy storage systems that do not have SBS, protection occurs on a file by file basis or using fixed RAID groups, which introduce many difficult problems such as long rebuild times, inefficient storage of small files and costly management of disk layouts.