4 practical ways to make storage capacity management less painful

Posted October 31, 2017 by Mike Bott — Senior Systems Engineer. Filed under Engineering.

Nobody likes throwing things away, especially when that “thing” is data, which is how file systems get full. Sometimes file systems run out of capacity because of an engineering or user mistake, but often it’s just something that happens during a normal day. Admins typically don’t know the fine-grained value of the data the way their users do, so they can’t safely clean things up on the user’s behalf. But, at some point, something has to go.

The first challenge to regaining capacity is determining what to delete. To do that you’ll also need to find out where to look to find what to clean up! If you’re not familiar with recent activity in each directory structure (and who is?), you might try analyzing the file system with standard tools. This works great if the system only has ten thousand files in it. But what if it has ten or a hundred million or even a billion files in it? Assuming a single-threaded process, if each stat call takes a millisecond, a hundred million files takes about a day to visit and generates a steady load of 1000 IOPS. So, not only is your info a little old, but it takes a long time to receive, and that’s just at the top-level of your search! You will have to rinse-and-repeat as you descend into the file system.

Obviously, you need a solution with better performance. For instance, you might multithread the process. With twenty workers performing stat calls, all acting in parallel, you can reduce your day-long operation down to a little over an hour. The problem with this approach is that now you have a steady-state load on your system of 1000 times 20 workers, which equals 20000 IOPS! That’s a significant workload, and the important takeaway here is that’s 20000 IOPS that the production systems can’t use. All in the name of knowing where your capacity is.

We recently discussed capacity and other common storage pains during a recent webinar, which you can watch below:

 

Some ways to solve your pain of storage capacity management

When it comes to analyzing your capacity, there are a few standard techniques.

One technique is to make a full copy of the data in question as a backup and run stat calls against that metadata. This is not a terrible approach, because it uses the backup rather than the production system. While baselining the backup is expensive in terms of throughput, pulling just the changes from the production file system would be a reasonable compromise. Keep in mind that this technique does raise the cost of your backup tier because there is value in the software that does the analysis. If you roll your own, then you can keep the cost of this option down.

A different option is to get more aggressive about scanning and build that functionality into your storage system, which means you allow external systems to query that data or issue requests to gather that data. This approach is not bad either. Running a local job to gather metadata cuts down on the round-trip time for all those stat calls. You’ll use up some IOPS because a tree walk and-stat calls are still necessary, but the interface is more efficient than something like SMB or NFS.

Another approach is to use an external third-party system that scans everything you’ve got and gives you answers across the whole storage environment, including multiple storage vendors. If you have a lot of storage sprawl, a tool like this could help you get a complete picture and that is very valuable. A lot of tools that do this also have some kind of data management/movement capability. You could use what you learn about l your storage environment to set up policy-based movement of data between tiers or workflow steps. The downside of this approach is that those tools still have to scan to find changes, so you haven’t really removed the metadata IOPS load from the storage systems and you’ll still be a little behind in terms of updates.

Finally, you can do away with scanning and stat calls with files and directories that regularly update their parent directories, and store that data in the already-existing metadata database. This approach is actually a significant improvement because the update can happen in near real-time. If every object with fresh changes reports to its parent every 15 seconds, and if, for example, there is a directory tree that is eight levels deep, it will be two minutes for root to find out about an add or delete at the deepest level. That’s a lot better than an hour or a day! This is the approach QF2 uses for its real-time analytics.

Another advantage to the QF2 approach is that, no matter how much scanning you do and no matter how many stat calls you make, you still can’t easily answer that most important question, “Which data matters?” Everyone thinks their data is critical, but, with QF2, if someone disputes the importance of a project that’s due to be archived, you can use analytics data over time to show that it hasn’t been touched in months or years. That adds clarity to an otherwise murky storage decision. Conversely, this analytics data enables you to also show that sometimes, even though a file is old, it represents a data set that still gets used regularly.

Takeaways about capacity pain

As with any engineering task, it is up to you and your team to determine which approach works best for your environment. If you are experiencing pain around your storage capacity, here are a few top-level things to think about:

  • Don’t be afraid of new-ish vendors. Newer entrants to the market will probably have more modern ways of dealing with capacity analysis than older, more established vendors.
  • Look for storage optimizations. Everyone scans, so look for a storage system with optimizations such as metadata caching, clever methods of pruning the search, and local scanning.
  • Look for an API. If you value tight workflow integration, be sure you have programmatic access to the scanned data, somehow. An API is best, even if it can only query a database hosted on the storage system. You might want to integrate capacity data into your production management system or your media asset manager, and you want that analytics data to be easy to consume and manipulate.
  • Use quotas or volumes. Use quotas or volumes to manage user behavior and to keep users from filling up your storage with their data. For example, QF2 has directory-based quotas that can be applied in real time.

Capacity management is just one of the many challenges that come up on a daily basis for a storage admin. We recently held a webinar where I outlined the eight most common sources of storage pain and ways to solve them.

Suggested Material

blogimage-showback-title

Showback and Shameback: Getting people to delete data

When enterprise storage runs out of capacity, getting people to delete data can be a challenge. Find out how Qumulo makes capacity planning easier.

Read More
3card-try-qf2

Try QF2

You can create a standalone instance of QF2 on AWS or download a VMware OVA file for free and create a QF2 virtual cluster

Try now
gartner-mq-square.jpg

Qumulo makes strong debut in Gartner Magic Quadrant

Qumulo has been named a Visionary in the Gartner Magic Quadrant for Distributed File Systems and Object Storage. We are the only new vendor to be named with this distinction. Read more about our entrance into the Magic Quadrant in our latest blog post.

read more
224x224
Mike Bott
Senior Systems Engineer
I get to work with absurdly smart people on a daily basis and know I'm making meaningful contributions to a company that is disrupting the storage industry

Mike is a Systems Engineer with over 15 years of experience in shared high performance mass storage systems primarily for TV/film, internet media delivery, and supercomputing applications. His specializations include shared filesystems, clustered filesystems, NAS, SAN, and RAID storage.

Let's start a conversation

We are always looking for new challenges in enterprise storage. Drop us a line and we will be in touch.

Contact Information

REACH US

EMAIL

General: info@qumulo.com
PR & Media: pr@qumulo.com

WORK WITH US

SUPPORT

Search

Enter a search term below