By Daniel Pehush and Gunter Zink
The mission : Early beginnings
A long, long time ago, on a chilly, dreary, afternoon in late fall of 2016, Jason Sturgeon, our hardware product owner here at Qumulo, conveyed to the hardware engineering team that our current and potential customers wanted an all-flash product.
They desired for Qumulo to have a new tier of storage platforms in our portfolio, a faster and flashier product.
Our competition locks their customers into their customized hardware solution; while our mission statement as a company is to not do so. Looking at other all-flash storage solutions as a starting point, we explored several blade-based platforms. In further discussing offerings from our partners, we found a few solutions that had the density, cost, and form factor attributes we were looking for.
The major decision for this platform was whether to use SATA/SAS SSDs or NVMe SSDs. Vendors had platforms in interesting form-factors which could take both. Following the pattern of our competitors, we looked at cloud-focused systems that had multiple servers in a single physical chassis. We considered options such as a 1U chassis that could use 12 SATA 2.5” SSDs or 12 NVMe SSDs, with two nodes of compute within that chassis. Or a 2U chassis that could use 24 SATA SSDs or 24 NVMe SSDs with four nodes of compute within a single chassis. A single 2U that contains four servers!
Input from customers and vendor partners
While looking at these servers, having vendors stop by our HQ, and bringing in samples for inspection, we were also talking to our customers. Customers are our magnetic field, so we let them be our guiding star for building the right solution. Embarking on making a new platform, especially a whole new class of platform, we consulted our customers, both current and potential.
Careful not to make decisions in a bubble, we also consulted our vendors, as they are key partners in making a platform successful. Working with customers and vendors alike results in the creation, delivery, and usage of a product which improves the end user experience.
NVMe or bust!
One very clear message came out of these discussions: NVMe or bust. NVMe is the future of flash!
Given NVMe SSDs would soon be at price parity with SATA SSDs and provide huge performance benefits over SATA/SAS drives. NVMe was the choice for vendors and our customers. Working to build a forward-looking platform with years of headroom, Qumulo is always looking to where the data needs will be in years to come. As such, Qumulo chose to take the leap into the glorious future and build its first all-flash platform on NVMe SSD technology.
However, during our research of utilizing NVMe with what was available, we found a painful deficiency.
Available platforms were not based on the SkyLake architecture, which wouldn’t be rolling out via the various server and chassis vendors for some time. A standard, called Volume Management Device, for managing NVMe device hot swap was being developed and launched alongside SkyLake architecture. All NVMe implementations up until this technology launch relied on proprietary software to manage hotswap of a NVMe device.
As a software startup, Qumulo spends cycles delivering value. As a result, taking the scope of developing a software feature or modifying the kernel to handle the sudden disappearance and reappearance of a PCIe device was not something for which we were willing to sign up to develop; especially when a technology release on the horizon would deliver the feature we needed, without cost from us.
A different opportunity for the hardware team presented itself so, while the technology around NVMe hot swap was not fully baked, we shelved this platform to revisit later when we could deliver customer value, and not at the cost of other vital features.
The train leaves the station
Six months passed, and now the technology had the features we needed. We considered a number of architectures. Intel had just released the Scalable Xeon (aka SkyLake) and AMDs EPYC CPUs was about to be released. We chose Intel SkyLake due to the higher NUMA node count in the AMD EPYC CPUs. The software development effort needed to handle the higher NUMA domain count wouldn’t have provided adequate value for our customers to undertake it.
To determine the specific CPU to use, we brought two models in house to test. In selecting this CPU we considered the Thermal Design Power (TDP) of the processors, as we knew our all-flash product would be fast, but it would also be hot! The ability to cool a 2U server utilizing 24 NVMe U.2 devices, each capable of dissipating 25W of power, is a bit daunting. For 24 drives, each with 25W power dissipation, would be 600W, realistically each drive at maximum write workload will only draw half of their potential power dissipation. Still, you must design a safety margin to handle unexpected spikes in power draw and to the specification of the components you utilize.
Realistically, those drives will never actually draw 600W, but you have to be sure and have the safety margin to handle unexpected spikes in power draw. Since this was to be our fastest system, you might assume we would just pick the fastest CPU available. What we did was chose a CPU that delivers the best value-to-cost ratio for our customers. This led us to the Intel Xeon Gold 6126 Processor, which has a smaller core count and faster frequency that our software is able to take advantage of and, therefore, deliver the best value to our customers.
In spring of 2017, we were having discussions with vendors again, looking at the various platforms that we could turn into the all-flash NVMe solution desired by our customers. At that point, the blade solution was still an idea, so we took a look at what was available.
We identified concrete constraints of the product. Flash is expensive and cost was a big factor. While we were making a Bugatti Veyron of storage products, it still needed to be sellable at a Dodge Viper price.
It needed to be fast, but how fast is fast enough to delight our customers?
We aimed to create a hardware box that would be more than capable of 4 GB/s per node for multi-stream read and have plenty of headroom to grow as we tuned it. We chose to aim for 125K IOPS per node. We needed around 40TB per rack U to deliver a compelling product that our customers would love. We narrowed in on the optimal platform options and decided on a 1U and 2U prototype for us to conduct proof-of-concept work.
We tested Qumulo software on the prototype boxes and voila! We had a complete but not-yet-sellable all-flash product!
Due to our hardware abstraction layer, we could alter this code layer with minimal changes to run on foreign hardware in no time. Another win for to making our software hardware agnostic.
Stay tuned for part two of the exciting series!
We’re engineers, builders, craftsmen and artists. We build products that are always on, that never lose data, and that store and retrieve data fast. Connect with us on social media!