We are pleased to announce a bigger, faster, stronger version of Qumulo Core. The spirit of these words has been used to describe everything from our Olympic teams to Steve Austin in “The Six Million Dollar Man”. Just like The Six Million Dollar Man who had surgery to replace normal components with bionic ones, we have replaced a functional component with a better one built with state of the art technology. What bionics were to Mr. Austin, erasure coding is to Qumulo!
Erasure coding protection is a huge advancement for Qumulo Core. Storage administrators and users will experience little, if any, difference in how you interact with a Qumulo cluster. But just as Steve Austin looked like a normal human but wasn’t, inside the code, major changes have been made. The power of erasure coding is in the way it protects the data on your cluster while also enabling you to more efficiently use cluster space.
Erasure coding increases your protection against the inevitability that disk drives will fail. Should 2 drives fail at the same time on your cluster – no problem. Your data is guaranteed to be safe. By combining erasure coding’s enhanced protection with the industry’s fastest drive reprotect times Qumulo can build clusters in sizes far beyond anything in use today.
To achieve this, Qumulo Core creates 6 blocks for every piece of data written to the cluster. Four of those blocks contain the data itself separated into equal size pieces and two parity blocks used solely to recreate data that may be lost due to a drive failure. Using Reed-Solomon to manage the required math, Qumulo Core recreate any lost piece of data by using any four of the remaining blocks. It doesn’t matter which four – as long as there are four. Because we explicitly never place any of the 6 blocks on the same drive we can lose 2 drives and be guaranteed to have at least 4 blocks remaining. Below is an example of how erasure coding uses parity to protect data against 2 concurrent drive failures:
In this example, the four pieces of the file are each 4 bytes long. Each piece is one row of the matrix. The first one is “ABCD”. The second one is “EFGH”. And so on.
The Reed-Solomon algorithm creates a coding matrix that you multiply with your data matrix to create the coded data. The matrix is set up so that the first four rows of the result are the same as the first four rows of the input. That means that the data is left intact, and all it’s really doing is computing the parity.
You lose 2 drives!
Applying the inverse matrix leaves the data in this state which leaves the equation for reconstructing the original data from the pieces that are available:
Now, apply simple algebra
Voila! Your data has been reconstructed from a 2 drive failure
Erasure coding enables you to use more of your raw disk space for storing data. Mirroring protects your data by making 2 copies of everything on your cluster. Half of your space is dedicated to protection. Qumulo’s erasure coding uses 33% less disk space. With our initial implementation only one-third of your raw space is dedicated to protection allowing you to get up to 67% efficiency on your drives. In future releases the percentage of usable space will get even larger.
In the future, we will offer options for additional encoding schemes that provide significant increases to the amount of usable space on the cluster while maintaining protection against 2 drive failures. The user interface is slightly modified to provide a more accurate picture of the protection level of your cluster. The cluster overview page is now dedicated to data protection and no longer mixes this data with data availability (the case where the cluster is offline, with no risk to your data) as you might find if a node goes offline because of networking or some other non-disk failure.
This first release doesn’t have everything. There is still work underway to improve our performance and user experience while in a degraded mode (down disk or down node) and certain non-read/write performance metrics like delete speed still require work.
The implementation of erasure coding creates more resiliency against drive failures, allows for massive cluster sizes and makes the raw space on your cluster more efficient. In short, Qumulo Core is now bigger, faster and stronger.
Ben Gitenstein runs Product at Qumulo. He and his team of product managers and data scientists have conducted nearly 1,000 interviews with storage users and analyzed millions of data points to understand customer needs and the direction of the storage market. Prior to working at Qumulo, Ben spent five years at Microsoft, where he split his time between Corporate Strategy and Product Planning.