Recently, an old friend and I were catching up and exchanging developer horror stories from over the years. He was venting about a service he recently inherited and now had to maintain. As a Qumulo Developer, I couldn’t help but get excited, because all the pain he was feeling was exactly where Qumulo is great!
At a high level, my friend was building a service that had to keep track of a few billion user-created assets. The assets themselves were bundled into user-defined buckets, where each asset could be in one and only one bucket. The service provided the following operations:
- Create a bucket
- Add new assets to a bucket
- List and retrieve the assets in a bucket
- Move assets from one bucket to another
- Modify the content of an asset
The service was running in AWS and needed to easily scale to billions of assets and beyond. He had a tight deadline, as the existing asset tracking solution couldn’t scale. He didn’t want to build a database and write a bunch of code to glue together the storage and his bucket and asset management logic. As such, he decided to build his service on top of S3. At first glance, it seemed really easy to implement this on top of S3 directly. Each asset could be an S3 object, with an object key of, say,
/bucket_name/asset_name. S3 objects can be added, modified, renamed, and listed with bucket_name as the matching prefix, which covered all his use-cases. All he had to do was implement a simple wrapper service around the S3 API and his service was complete. Or so he hoped.
Seeing as we were exchanging horror stories, he quickly started detailing all the various dragons he faced, and I couldn’t help but point out how Qumulo for AWS would have made all of that pain go away.
The pain started with a continuous analysis pipeline which was built on top of his service. An ingestion tool would put assets in a bucket, then notify an SQS work queue that the asset needed analysis. At that point the analysis pipeline would pull the work item off the SQS queue, read the asset, modify it, then notify another SQS work queue for any further required analysis steps. This analysis pipeline worked great against the old unscalable system. Unfortunately, it was built on the assumption that the storage service operations were atomic and consistent. It turns out that wasn’t the case with the new system – S3 is an Eventually Consistent storage system.
It’s important to note that the S3 solution worked well… most of the time. Every now and then, this eventual consistency would cause all sorts of concurrency problems with this analysis pipeline. Newly-added assets would not appear during subsequent list operations, causing “missing asset” problems in the analysis pipeline. Modifications of assets wouldn’t appear to further stages of the pipeline, as subsequent reads would get the original stale data. The list of eventual consistency problems went on and on.
At this point, I couldn’t help but point out – Qumulo is a strongly consistent storage system. File creates, writes, renames, always immediately appear to every client. All these problems he was describing just didn’t exist on Qumulo storage.
S3’s eventual consistency problems were causing my friend to spend months building up a side-database to keep track of his assets. He had to modify assets out-of-place by writing modifications as new S3 objects to avoid stale reads. Moving assets between buckets became a complicated database-assisted operation where bucket membership was now controlled by this side database in order to avoid eventually-consistent objects appearing in both buckets after a move operation. He now had to design and manage this metadata database and make sure it was also scalable to billions of assets. He even had to re-architect the analysis pipeline to expect and handle these eventually consistent behaviors of S3 by re-queueing work when the asset hadn’t yet appeared. So much for a simple project.
When I first told my friend he could have used Qumulo, he brushed me off with his scalability requirement – “that sounds cool, but I have billions of assets. This is the sort of stuff you have to deal with when you’re at scale with billions, right?” I immediately responded “Not at all! Qumulo is not only consistent, it’s scales with high performance! We easily scale to billions of assets, and maintain high performance along the way.”
Chris is a developer focused on bringing high performance enterprise-grade file storage to the cloud