Qumulo’s Distributed File System
Contents of this guide
Data mobility, linear scalability, and global reach
For scalability, Qumulo’s file system has a distributed architecture where many individual computing nodes work together to form a cluster with scalable performance and a single, unified file system. Qumulo clusters, in turn, work together to form a globally distributed but highly connected storage fabric tied together with continuous replication relationships. Customers interact with Qumulo clusters using industry-standard file protocols, REST API and a web-based graphical user interface (GUI) for storage administrators.
The diagram below shows the connections between clients, Qumulo clusters comprised of nodes running Qumulo Core, and multiple Qumulo clusters, comprising the fabric, running in multiple environments and geographic locations.
Qumulo’s file system is a modular system. As demands increase on a Qumulo cluster, you simply add nodes or instances. Capacity and performance scale linearly, and this is true whether the Qumulo cluster is operating in an on-premise data center or in the public cloud.
Massively scalable files and directories
When you have a large numbers of files, the directory structure and file attributes themselves become big data.
As a result, sequential processes such as tree walks, which are fundamental to legacy storage, are no longer computationally feasible. Instead, querying a large file system and managing it requires a new approach that uses parallel and distributed algorithms.
Qumulo’s file system does just that. It is unique in how it approaches the problem of scalability. Its design implements principles similar to those used by modern, large-scale, distributed databases. The result is a file system with unmatched scale characteristics. In contrast, legacy storage appliances were simply not designed to handle the massive scale of today’s data footprint.
The Qumulo file system sits on top of a virtualized block layer called the Scalable Block Store (SBS). Time-consuming work such as protection, rebuilds, and deciding which disks hold which data occurs in the SBS layer, beneath the file system.
The virtualized protected block functionality of SBS is a huge advantage for the Qumulo file system.
In legacy storage systems that do not have SBS, protection occurs on a file-by-file basis or using fixed RAID groups, which introduces many difficult problems such as long rebuild times, inefficient storage of small files, and costly management of disk layouts. Without a virtualized block layer, legacy storage systems also must implement data protection within the metadata layer itself, and this additional complexity limits the ability of these systems to optimize distributed transactions for their directory data structures and metadata.
For scalability of files and directories, the Qumulo file system makes extensive use of index data structures known as B-trees. B-trees are particularly well-suited for systems that read and write large numbers of data blocks because they are “shallow” data structures that minimize the amount of I/O required for each operation as the amount of data increases. With B-trees as a foundation, the computational cost of reading or inserting data blocks grows very slowly as the amount of data increases. B-trees are ideal for file systems and very large database indexes, for example.
In Qumulo’s file system, B-trees are block-based. Each block is 4096 bytes. Here is an example that shows the structure of a B-tree.
Each 4K block may have pointers to other 4K blocks. This particular B-tree has a branching factor of 3, where a branching factor is the number of children (subnodes) at each node in the tree. Even if there are millions of blocks, the height of a B-tree in SBS, from the root down to where the data is located, might only be two or three levels. For example, in the diagram, to look up the label q with index value 1, the system traverses only two levels of the tree.
The Qumulo file system uses B-trees for many different purposes.
There is an inode B-tree, which acts as an index of all the files. The inode list is a standard file-system implementation technique that makes checking the consistency of the file system independent of the directory hierarchy. Inodes also help to make update operations such as directory moves efficient. Files and directories are represented as B-trees with their own key/value pairs, such as the file name, its size and its access control list (ACL) or POSIX permissions. Configuration data is also a B-tree and contains information such as the IP address and the cluster.
The Qumulo file system can be thought of as a tree of trees. Here is an example.
The top-level tree is called the super B-tree. When the system starts, processing begins there.
The root is always stored in the first address (“paddr 1.1”) of the array of virtual protected blocks. The root tree references other B-trees. If the Qumulo file system needs to access configuration data, for instance, it goes to the config B-tree. To find a particular file, the system queries the inode B-tree using the inode number as the index. The system traverses the inode tree until it finds the pointer to the file B-tree. From there, the file system can look up anything about the file. Directories are handled just like files.
Relying on B-trees that point to virtualized protected block storage in SBS is one of the reasons that in a file system with a trillion files is feasible.