Qumulo’s Distributed File Storage System
Here’s how it works.
Go under the hood and see Qumulo unique software-defined storage abilities.
Qumulo is an enterprise-proven, software-defined, distributed file system designed and built for the cloud. Given the unrelenting growth of unstructured data, you need a file system that can scale to billions of files. In order to manage this volume of data, you need to know what is going on within the file system. Qumulo software provides real-time visibility, without tree walks of usage, activity and throughputs. Qumulo runs on industry-standard platforms, including HPE, Dell AWS and GCP.
How can it be so fast?
When people see Qumulo’s file system in action for the first time, they often ask: “How can it do that so fast?” Get the answers here, in our technical guide.
Clusters that work together
With Qumulo’s file system, cloud instances or computing nodes work together to form a cluster that has scalable performance and capacity and a single, unified file system. Qumulo clusters work together to form a globally distributed but highly connected storage solution tied together with continuous replication.
Qumulo’s software is unique in how it approaches the problem of scalability. Its design incorporates principles used by modern, large-scale, distributed databases. The result is an enterprise-proven file system with unmatched scale characteristics.
Billions of files
For massively scalable files and directories, the Qumulo file system makes extensive use of index data structures known as B-trees. B-trees minimize the amount of I/O required for each operation as the amount of data increases. With B-trees as a foundation, the computational cost of reading or inserting data blocks grows very slowly as the amount of data increases.
A highly distributed scalable block store persists the B-trees across the Qumulo cluster.
Qumulo’s file system provides real-time visibility and control for file systems of all sizes, even with file counts numbering in the tens of billions. When we say real-time, we mean real-time – there are no tree walks. In the Qumulo file system, metadata such as bytes used and file counts are aggregated as files, and directories are created or modified. Real-time analytics allow administrators to pinpoint problems and effectively control how storage is being used. The answers to these queries arrive instantly. With Qumulo, storage administrators can see usage, activity and throughput at any level of the unified directory structure.
Just as real-time aggregation of metadata enables Qumulo’s real-time analytics, it also enables real-time capacity quotas. Quotas allow administrators to specify how much capacity a given directory is allowed to use for files.
Unlike legacy systems, with Qumulo quotas are deployed immediately and do not have to be provisioned. They are enforced in real time, and changes to their capacities are immediately implemented. Quotas can be specified at any level of the directory tree.
Snapshots let system administrators capture the state of a file system or directory at a given point in time. If a file or directory is modified or deleted unintentionally, users or administrators can revert it to its saved state.
Snapshots in Qumulo’s file system have an extremely efficient and scalable implementation. A single Qumulo cluster can have a virtually unlimited number of concurrent snapshots without performance or capacity degradation.
Qumulo’s file system provides continuous replication across storage clusters, whether on-prem or in the public cloud. Once a replication relationship between a source cluster and a target cluster has been established and synchronized, Qumulo automatically keeps data consistent. There’s no need to manage the complex job queues for replication associated with legacy storage appliances.
Continuous replication in Qumulo’s file system leverages advanced snapshot capabilities to ensure consistent data replicas. With Qumulo snapshots, a replica on the target cluster reproduces the state of the source directory at exact moments in time. Replication relationships can be established on a per-directory basis for maximum flexibility.
When people are introduced to Qumulo’s real-time analytics and watch them perform at scale, their first question is usually, “How can it be that fast?” The breakthrough performance of Qumulo’s analytics is possible because of a component called QumuloDB.
QumuloDB continually maintains up-to-date metadata summaries for each directory. It uses the file system’s B-trees to collect information about the file system as changes occur. Various metadata fields are summarized inside the file system to create a virtual index. The performance analytics that you see in the GUI, and can pull out with the REST API, are based on sampling mechanisms that are enabled by QumuloDB’s metadata aggregation. QumuloDB is built-in and fully integrated with the file system itself.
Qumulo recognizes that it is part of a larger ecosystem that often has stringent security and compliance requirements. Qumulo’s file system enables detailed audit trails that are simple to set up and integrate with standard monitoring tools. Qumulo’s auditing is also able to scale to millions of IOPS with minimal impact on performance, and will track all important items (as well as the little things such as “who deleted that file?”).
Scalable Block Store (SBS)
The Qumulo file system sits on top of a transactional virtual layer of protected storage blocks called the Scalable Block Store (SBS). Instead of a system where every file must figure out its protection for itself, data protection exists beneath the file system, at the block level. Qumulo’s block-based protection, as implemented by SBS, provides outstanding performance in environments that have petabytes of data and workloads with mixed file sizes.
SBS has many benefits, including:
- Fast rebuild times in case of a failed disk drive
- The ability to continue normal file operations during rebuild operations
- No performance degradation due to contention between normal file writes and rebuild writes
- Equal storage efficiency for small files and large files
- Accurate reporting of usable space
- Efficient transactions that allow Qumulo clusters to scale to many hundreds of nodes
Built-in support for all-flash configurations for workloads that require the highest performance
The virtualized protected block functionality of SBS is a huge advantage for the Qumulo file system. In legacy storage systems that do not have SBS, protection occurs on a file-by-file basis or using fixed RAID groups, which introduces many difficult problems such as long rebuild times, inefficient storage of small files, and costly management of disk layouts.