Compute and GPU time are the most expensive resources in a modern media production pipeline. Whether you're running a cloud render farm, a follow-the-sun editorial operation, or a hybrid on-prem and cloud VFX pipeline, the economics only work when your compute is producing, not waiting.
Cloud Native Qumulo on AWS already delivers enterprise file performance at cloud scale to your data, backed by the virtually unlimited capacity of S3, with a single global namespace that spans regions and locations. Cache warming takes that a step further. It lets you tell the cluster exactly which data you want ready before the workload starts, so your artists and render nodes hit the ground running every time.
The API
Cache warming is exposed through three interfaces, available today in Qumulo Core 7.8.3+.
qq fs_fetch_tree walks an entire directory tree recursively and pulls every file into cache. One command warms a full project:
qq fs_fetch_tree --path "/folder-path/"
qq fs_fetch_file targets a single file, useful when you know exactly which asset is needed, or want to warm just one sequence before a review:
qq fs_fetch_file --path "/path-to-file"
For teams integrating warming into their own pipeline tooling, the same capability is available directly via the REST API, for individual files, a POST to /v1/files/{ref}/fetch-data. This makes it straightforward to build warming steps into render dispatch systems, pipeline orchestration tools, or shift handoff automation.
How CNQ Caching Works Today
Cloud Native Qumulo stores your file data durably in S3 and serves it from a high-performance local cache on each cluster node. As you work, the cluster intelligently manages what stays in cache, keeping your active working set hot and close, while the full project history sits safely in S3 behind it.
On top of that, NeuralCache, Qumulo's predictive read cache, adds an intelligent layer that watches access patterns and prefetches data it anticipates you'll need next. For the non-linear, creative nature of editorial and finishing work, NeuralCache is constantly getting ahead of the workload based on how your team actually works. The colourist who jumps between sequences. The supervisor reviewing shots out of order. In most cases, NeuralCache is seamless: requests are served from the cache with little to no latency overhead, and the system learns patterns over time.
But there are moments where you already know what's needed.
Why We Built Cache Warming
The primary driver for cache warming is Global Name Space across portals (Hub and Spoke topology). Within a single cluster, NeuralCache is highly effective: it monitors access patterns, predicts what you'll need, and pulls it from S3 into the local NVMe cache in advance. Cloud node to S3 latency is small enough that this works seamlessly for most workloads. The challenge emerges when a Spoke is serving data through a portal relationship with a Hub in another region or bursting into the cloud. NeuralCache can still predict what the workload needs, but it can't get that data to you fast enough.
Cache warming complements NeuralCache by adding a proactive, directed layer on top. Rather than predicting access, you declare it. For known workloads- a render that starts at 6 am, a handoff that arrives at 9 am, a VFX batch that kicks off after review, a workload on the other side of the planet- you can ensure the data is already in the Spoke's local cache before the first request arrives.
This is where cache warming really earns its place: as a tool for the parts of your pipeline you can plan.
The Workflows That Win
Burst Rendering, Take Your Data To The CPU
Cloud burst rendering is one of the most powerful tools in a modern production pipeline. Scale up a hundred render nodes for a deadline, scale back down when the job is done. The economics are compelling, but they depend on one thing: those nodes need to be rendering, not waiting for data.
Without cache warming, each render job incurs a cold-start cost. The farm spins up; the first nodes begin requesting scene files, textures, and geometry caches, and the storage begins pulling that data from either your on-premises cluster or a distant hub node to fill the cache on the Spoke. The render progresses, but at a fraction of its potential throughput until the working set is warm. On a large farm running large datasets, that ramp-up window is real time with real CPU cost attached.
With cache warming, you feed the important data directly into NeuralCache. When the first render node comes online, the working set is already resident in NVMe. The farm hits full throughput from frame one. And if your master data lives on premises, ingest, finishing, archive, decades of project history- cache warming closes that gap entirely. You push the render assets to the cloud Spoke before the render farm starts. They open the project, and the data is already there.
Follow the Sun Editing
Qumulo's Cloud Data Fabric enables a Hub-and-Spoke topology across regions and facilities. One cluster serves as the Hub for your master data. Spoke clusters in other locations share access to that same namespace through a portal relationship, same paths, same file names, one project, everywhere.
A facility in Los Angeles runs the Hub. A finishing team in Sydney works through a Spoke. When the LA team wraps up for the day, the Sydney team starts their morning. Without cache warming, Performance is bound by WAN bandwidth; as they open sequences and pull up grades, the Spoke is filling its cache organically, asset by asset.
Cache warming changes that handoff entirely. End of day in LA, the pipeline knows what Sydney is working on tomorrow. You warm those directories on the Sydney Spoke while LA is wrapping up. When the team opens Premiere in the morning, every clip is already in cache. Scrubbing is smooth. Playback is clean. The first hour is as fast as the last.
Going Faster, qfetch for High File Count Workloads
The built-in qq fs_fetch_tree command works well for warming large files. For a 500 GB dataset of production ProRes files, it completes in around four minutes. But for workloads involving thousands of files, frame sequences, texture libraries, audio stems, EXR plates, fs_fetch_tree walks the directory tree and fetches files sequentially, one at a time. For a 50,000-file dataset, that sequential approach took over two and a half hours in our testing.
To address this, Qumulo Engineering built qfetch, an open-source tool we've made available to the community as an example of how to integrate with the new API for programmatic control. qfetch distributes fetch requests across configurable parallel walker and fetcher threads, hitting the API across dozens of simultaneous connections. On that same 50,000-file, 524 GB dataset, qfetch with 32 walkers and 64 workers completed the warm in 3 minutes 34 seconds.
qfetch --host <cluster-ip> --token-file token.json \
--path "/path-to-folder/" \
--walkers 32 --workers 64
qfetch is available on GitHub at https://github.com/Qumulo/qfetch.
Intelligent and Intentional, Better Together
NeuralCache and cache warming work at different layers of the same challenge.
NeuralCache does the heavy lifting — watching how your team works, learning access patterns, and keeping the right data in cache without anyone having to think about it. Cache warming works alongside it as a direct complement, letting you reinforce what NeuralCache is already doing for workloads you can plan ahead. You know a render is starting, a shift is handing off, a new site is spinning up, so you get the data there first. Together, they mean the cache is ahead of your workload, whether the next step was predicted or declared, and that combination matters most when data has to cross a WAN between sites, where a cache miss carries real latency and bandwidth cost rather than just a few milliseconds.
What's Next
Cache warming is available today as a preview feature through the qq CLI on Cloud Native Qumulo on AWS running Qumulo Core 7.8.3 or higher
If you're running media workflows in the cloud and want to talk through how cache warming fits your production model, we'd love to hear from you.