Find out how to turn AWS EBS sc1 into a powerful storage medium for your production workloads.
In early 2020, we were getting feedback from our customers that they wanted to store more file data in the cloud on Qumulo, but the AWS EBS st1 infrastructure cost exploded as they went above 50 or so TB. We broke down the cost of our solution, looked at it from many different angles, and saw as you scaled up in capacity, the cost was dominated by EBS. Later that year, after more testing and some changes to how our product handles high-latency disks, we were able to offer configurations that dropped the price by up to 70%. How did we do that?
First, some background.
Qumulo on AWS uses EBS gp2 volumes for SSD and st1 volumes for HDD
The Qumulo File Data Platform is a highly available distributed file system that runs across multiple identically configured nodes in a cluster. Every node in the cluster is an equal participant, and other machines or users needing to access data on a Qumulo system may do so from any node – and they’ll all see the exact same state.
This allows you to scale to serve thousands of connected clients, and some configurations are able to hit over 1M IOPS and tens of GB/s. Your data is protected from loss of an EBS volume (EBS advertises an annual failure rate of 0.1%-0.2%) and is still accessible even if a node is rebooted or shut down temporarily (i.e. for maintenance).
Qumulo’s two main modes of operation are all SSD, which means your data is written and read from SSD and is never moved anywhere else. Think of this as a single-tier platform designed to provide consistent low latency for things like video editing, playback, and rendering. We also have a two-tier mode which consists of a thin layer of SSDs that act as a persistent cache, and the “storage” is actually backed by a large bank of lower cost HDDs. This was the configuration we focused on making more cost effective for customers.
Our AWS product uses EBS gp2 volumes for SSD and st1 volumes for HDD. As you increase the capacity of your Qumulo cluster from 50TB, to 100TB, to 200TB, to 400TB, the cost of st1 begins to dominate the cost of the entire solution. And it gets expensive really fast. (RELATED: How to Budget and Control Costs for a High-Performing Data Strategy)
Now, if you know anything about AWS EBS, you may be aware that AWS has another, less expensive class of HDD called sc1. It’s advertised as: “These (sc1) are ideal for infrequently accessed workloads with large, cold datasets with large I/O sizes, where the dominant performance attribute is throughput (MiB/s).”
Qumulo experimented with sc1 early on but saw troubling data. The latency was much higher, and the cost did not seem to reflect the performance difference.
IO Latency sc1 vs st1 performance tests early in 2020
Read | Write | |
Sequential | +72% | +159% |
Random | +162% | +213% |
Plus, we already knew that our performance was impacted by the IO latency overhead of EBS being network attached. Some anecdotal testing with our software also did not look promising. We tabled sc1 as too risky to suggest to customers at that time.
Customers want cost-effective enterprise storage on AWS EBS sc1
Late last year, Qumulo looked again after having spent multiple long, expensive dev cycles tuning performance on its system. And, because we had customers specifically asking us for sc1, we said; well, let’s give it a shot one more time. We did some testing, found some bottlenecks in our code that we should tune, and then started benchmarking to get an idea of where we landed.
We were pleasantly surprised.
The maximum multi-stream performance on a cluster with sc1 as backing matched and occasionally exceeded the performance of an equivalent cluster running st1. While we realize this is not always going to happen for every workload, it shows how far our technology has come to help mask the effect of using cost-effective storage tiers in an enterprise file system.
Cost-Effective Tiers in an Enterprise File System
A sample of sc1 versus st1 in 2021
Cluster | AWS Infra Cost Per Month (us-west-2) | Maximum Multi-Stream Write MB/s | Maximum Multi-Stream Read (cached or from SSD) MB/s | Maximum Multi-Stream Read from HDD MB/s |
M5.12xlarge 4x55TiB st1 | $17,989.28 | 1635 MiB/s | 3192 MiB/s | 3202 MiB/s |
M5.12xlarge 4x55TiB sc1 | $11,230.88 (-37%) | 1683 MiB/s | 3135 MiB/s | 3194 MiB/s |
The values in the table above are provided for informational purposes only and are representative of a single multi-stream test Qumulo conducted with a 4-node cluster and m5.12xlarge instances. Pricing will vary based on enterprise agreements as well as desired deployment region. Performance can vary depending on networking latency, instance type, and node count.
How is this possible today?
Since inception, Qumulo has focused on optimizing the product and improving performance. With our software-defined solution, the only lever to pull is making our software smarter, better, and faster. This might seem like a disadvantage (why not just cheat with proprietary hardware?), but the upside is we get this value and effort everywhere, not just on prem.
Optimization at Qumulo means we:
- Always write to flash, full stop. We never involve slow HDDs in the hot path to writing data.
- Always cache all data that you write in memory too, just in case you ask for the same data right back, we don’t even have to go read a disk or ask other nodes for the same data.
- Move data to HDD in the background, out of sight, out of mind, as needed, automatically. Think of this as automated tiering to cheaper storage. We always move the coldest data, and we do this under the file system, at the block level. Hence, if you are running the Linux file command over and over and only reading the first 4KB of a file – we will treat that first block as hot, and the rest of the file as not hot. This maximizes the use of the SSD cache.
- Try SSD when you request data that is not in cache. If it’s not in SSD, we read it from the HDD, but we also start prefetching the next block in a file so it’s already in our memory cache if you haven’t asked for it yet. Why? To beat you to it before you ask for it. This makes reading from HDD seem fast, because it’s masked by the prefetcher. We also do this if you are reading files sequentially in a directory.
- Move blocks you’ve read from HDD a few times, back to SSD. Why? No reason to keep it on the slower medium if you are going to keep asking for that data. We automatically promote it for you – no tuning or policies required.
- Optimize our block allocation system so that no time is waiting for an address to write data.
- Own almost the entire code stack for everything in the product, right down to the data structures and algorithms. We can tune a bad sort or a hash map that is hitting bad cache lines like nobody’s business. We even wrote the NFS and SMB implementations from scratch, and are able to tune everything top to bottom in our stack.
- Run entirely in userspace, have our own task scheduler to avoid kernel context switching, and do everything via asynchronous IO.
Turning AWS EBS sc1 into a powerful storage medium for your production workloads
It takes time to build a scale-out storage system like Qumulo’s file data platform. But the value of doing this shows. We effectively turn sc1, an EBS volume type that Amazon doesn’t recommend for “active workloads” into a powerful storage medium for your production workloads. At $0.015 per GiB-month, and S3 at $0.021 per GiB-month at the cheapest, EBS sc1 is actually 28% less than S3 standard, and that’s just storage! S3 charges for API calls on top of storage.
Learn more
- Reduce Costs and Gain Flexibility on AWS with Qumulo Cloud Q
- Qumulo on AWS: Video Cloud Services for M&E Creative Studios