Cloud burst rendering with QF2

Posted September 27, 2017. Filed under Engineering.

If you watch our latest webinar, you’ll see a new demo that moves data sets to a QF2 cluster on AWS, renders them in cloud-based nodes, and then move them back to an on-prem QF2 cluster. This workflow is one that many QF2 customers regularly experience, and I’d like to share some of what I learned in helping them accomplish this with you.

The problem

Too often, when you’re rendering a large job, or perhaps many jobs, it becomes painfully obvious that you need more resources. This realization usually occurs as your deadline looms dangerously on the horizon. What do you do?

Calculating Rental Nodes vs. Cloud, Cost vs. Time

It’s fairly straightforward to calculate the cost of rental hardware nodes versus compute/storage time on a cloud provider. There is a break-even point. When you take into account the time it takes to order, deliver, and rack and stack the nodes, not to mention the challenge of finding available rental hardware, as well as enough data center space, power, networking, and cooling, the cloud starts to sound like a pretty good alternative if the demand is high enough. So just how do you burst your rendering to the cloud? There are object gateways, but all the most commonly used rendering applications are file based and who wants to deal with that mismatch? With Qumulo File Fabric (QF2), AWS, and some configuration, it can be done!

Determining the Infrastructure

Essentially, you want to extend the physical on-prem render farm (and all the accompanying infrastructure) into the cloud. NFS/SMB over a WAN link can be cripplingly slow because of latency. On the other hand, a cloud cluster that can serve files locally to the cloud render nodes is reasonable to set up. Data sets can be replicated to the cloud and the results moved back. Obviously, different levels of compute are available in the cloud and this should figure into your cost calculations. Pay more for faster, more powerful compute or pay less for slower, interruptible resources.

You should also think about setting a reasonable checkpoint in your renders. If you choose a tier that can be interrupted, restarting the renders from the last checkpoint can be easy or painful depending on your configuration.

You can automate the configuration of your cloud resources either with scripts or with deployment automation tools. There are plenty of packages out there and rolling your own is not that difficult.

Correctly Configuring the VPN

Of course, security is a primary concern, so you need to correctly configure your VPN. That connection provides the command and control communication to the cloud nodes and allows them to check out licenses from your on-premises license server. OpenVPN is great, and clients are readily available for both Linux and Windows. Some firewalls even support it natively! You’ll need to distribute keys and configuration files to each cloud node. You can also restrict the IP connectivity of the cloud instances so that only your on-prem network can access them (and vice-versa--you want only your cloud instances to access your network).

Accessing the License Server

You can’t render without a license server! Unfortunately, most licenses (and license servers) are keyed to a physical MAC address, and you probably have one or more already established in your environment. Spawning a virtual instance in the cloud is possible, but you’ll get a different IP (and MAC address) each time you start it up, which is painful if you need to get new licenses. By using a VPN, you direct all licensing queries from the cloud back to your infrastructure over a secure channel. (This assumes you have floating licenses available for the cloud render nodes.)

Using Queue control

How do you manage and control the renders? In the past I’ve used Deadline, but any queue management software should work. The VPN connection provides connectivity back to your queue manager for the cloud instances and they should show up as regular clients (assuming you install all the appropriate packages). Here again, licensing works over the VPN connection. It makes sense to configure a separate group for only the cloud nodes.

Using QF2 Replication

How do you get your data to and from the cloud? With QF2, replication is easy to configure. You set the directory and start or schedule the job. Data is seamlessly replicated from an on-prem cluster to a cloud instance of QF2. Again, this traffic can flow over the secure VPN connection.

Do it!

Now that the infrastructure is in place, let’s get some scenes rendered! Replicate a dataset that needs to be rendered from your on-prem cluster to the AWS QF2 instance. The render nodes have an NFS export (or SMB share) mounted from the cloud QF2 instance and are therefore mounted “locally” in the cloud. It goes without saying you want all the render nodes and the QF2 instance in the same region. Fire up the queue manager and send a job to the cloud nodes. It should work the same way as the local nodes do. Once the job is complete, replicate the resulting frames back to your on-prem cluster.

Closing down

Once the job is finished, you can either shut down or terminate the QF2 cluster and render nodes. If you shut the instances down, they will only accrue cloud storage charges. Fire the cluster back up again the next time you need to burst to the cloud. Alternatively, you can terminate the cluster and set it back up again when you need it.

Happy Rendering!

Suggested Further Readings

blog-qf2-logo2

Introducing Qumulo File Fabric

QF2 is the world’s first universal-scale file storage system. By universal scale, we mean that QF2 meets the new requirements that have arisen as file-based datasets have grown, not just in size but also in the number of digital assets, their global reach, and the value they represent to organizations.

Read More
blogimage-growingvfx1

Growing VFX Storage: Studios Expand Operations with Storage

VFX studios looking to expand their operations can leverage storage technologies to scale business. Read how one studio is growing VFX storage with Qumulo.

Read More
blogimage-lala1

Crafty Apes uses Qumulo to turn 8,000 Frames into Six Minutes of Dancing in La La Land

Crafty Apes is a visual effects company that has helped bring the stories of superheros like Doctor Strange and NASA scientists in Hidden Figures to life. But even the experts of 2D compositing faced a new challenge last year in the form of a six-minute dance routine on a Los Angeles highway.

Read More

Let's start a conversation

We are always looking for new challenges in enterprise storage. Drop us a line and we will be in touch.

Contact Information

REACH US

EMAIL

General: info@qumulo.com
PR & Media: pr@qumulo.com

WORK WITH US

SUPPORT

Search

Enter a search term below