At Qumulo, we designed our file system to be hardware-agnostic.
In our first year, we interviewed hundreds of file storage customers and one of the strongest themes to emerge was a desire to have the freedom to run file storage on the latest data center hardware, as well as in the public cloud. “Customers Are Our Magnetic Field” is one of our key company principles, and with these interviews, our customers were giving us a clear mission.
Qumulo’s enterprise-grade, fault-tolerant file system runs on a variety of hardware platforms, including Dell, HPE and our own infrastructure (Note: just this week, we introduced the newest addition to our hardware platform options, the C-72T). But “hardware agnostic” doesn’t mean that we don’t have standards. We deliberately and carefully certify specific platforms for customers to use, because we want customers to have a consistently great experience. Furthermore, the ease with which we can bring up new certified platforms is remarkable. Hardware-agnostic software design leads to ease of feature implementation by creating a clear abstraction layer from the hardware as well as higher confidence in testing.
Many modern SSDs have the ability to write 4k blocks atomically. An SSD with this property will respond to a write request if, and only if, the write has landed persistently on the drive, in the correct location, in the order the write was issued, without any bit errors or other corruption. The SSD must do this even in the presence of power failures. We test this thoroughly with our own power fault testing and hardware for every SSD model we ship. Doing so gives us the confidence to provide hardware atomicity guarantees to the software stack. The power fault testing methodology we use for this process is fascinating, and deserving of its own blog post in the future.
Our core block device architecture is based entirely on this atomic 4k write property. The upshot is that we can make atomic changes to data structures that are much larger than 4k by writing lists and trees built out of 4k blocks and atomically swapping the roots to update large amounts of data transactionally.
Abstracting away the hardware
Our hardware abstraction is Linux – specifically, we run up-to-date Ubuntu 16.04 LTS. QFSD (Qumulo File System Daemon), which is the heart of the Qumulo Software system, runs as a single user mode process. QFSD requires no kernel modifications to run. instead, it depends on readily-available, open source packages that can be installed via the Ubuntu apt package manager.
When I tell people this, they usually respond with, “Oh, that’s cool! So you apt install ZFS and Samba or something and package it up with a nice UI”. This is not the case. QFSD is like a self-contained OS in its own right. It has its own block device layer, complete with an elevator and cache, that hooks into the Linux kernel via libaio. It has its own cooperative multitasking scheduler, because pthreads were too slow for our performance requirements. As a result, the dependencies we take on Linux are mostly syscalls to interface with the kernel. The kernel provides a wealth of well-supported drivers and tools that let us interact with most hardware. We also sometimes need to install and interface with proprietary vendor integration tools.
Qumulo maintains a hardware abstraction layer (HAL) that consolidates hardware information from Linux into a central location for use by QFSD. Sources of data aggregated by the HAL include the Linux kernel, kernel modules for specific hardware and vendors, command line applications for specific hardware and vendors, BIOS configuration and state, IPMI configuration and state, and more. Based on this aggregation, the HAL makes hardware-related decisions and passes this information up to QFSD.
The HAL uses the serial number of a node to determine its SKU. From there, it produces a hardware definition for this SKU. It asserts that present hardware components (CPUs, NICs, drives, etc.) are whitelisted by Qumulo for use in a node of this SKU. It determines which NICs are to be used for frontend and backend traffic. It maps low-level SSD and HDD information to QFSD’s working and backing disk abstractions. It also maps drives to bays for use in the UI. It can poll CPU temperatures, detect device hotplug events, configure hardware-based encryption, control LEDs on the chassis and drives, and so much more.
We ship a wide variety of Qumulo-manufactured SKUs. Qumulo file storage software runs on popular HPE and Dell server platforms. We run on AWS and GCP. Internally, we increase our test coverage by hosting Qumulo VMs on VMWare. We also test by running sandboxed “simnodes” as multiple QFSD processes on the same host, using flat files on the host as block devices and loopback for networking. As part of a Qumulo hackathon, one of our internal devs got the product up and running on his gaming tower at home with a bunch of discarded SSDs from our lab wired haphazardly to various SATA ports on his motherboard (we don’t recommend this configuration).
Qumulo is a software shop, but we do have an in-house hardware team. In the next entry, I’ll describe the role of this team and how it fits into the broader Qumulo mission of hardware independence.