“Qumulo is at the foundation of our AWS storage solution. Without it, we wouldn’t be able to expand to the capacity that we have.”
Jason Fotter, Co-Founder & CTO
ABOUT
FuseFX is an award-winning visual effects studio specializing in visual effects for episodic television, film, commercials, games, and special venues. FuseFX employs around 300 people and has three studio locations: its flagship Los Angeles office, New York City, and Vancouver, BC.
USE CASE
- Visual Effects
- Rendering
REQUIREMENTS
- Fast, clustered storage
- Enterprise features
- Innovation & long-term strategy alignment
FuseFX’s journey to the hybrid cloud with Qumulo
Cloud innovation solves FuseFX’s unsolvable capacity problem
Today, FuseFX’s three locations have more than 60 television shows in production simultaneously, in addition to various commercial and feature film projects. The company has provided visual effects for all the major studios on such productions as American Horror Story, Marvel’s Agents of S.H.I.E.L.D., and The Tick.
Jason Fotter, co-founder and CTO at FuseFX, is very aware of the challenges that come with building and running a render farm. “For me, it’s been a ‘learn as you go’ process. I’ve been surprised many times throughout the growth of the company. The amount of power and heat that a render farm generates, and the infrastructure needed to carry it, is massive,” he stated.
“I’ve found over the years that, no matter what size farm you have, you can easily overrun it at any given moment. The more you have, the more you will use. The problem arises when you are up against a delivery and time is not on your side. We need to be able to act quickly at these moments and that’s hard to do with physical infrastructure. Power, cooling, and physical space are all finite resources that put limits on what you can achieve.”
An ever-present constraint is that episodic television shows have tight deadlines. “We have two to three weeks to get our work done [with episodic television]. Feature films have six months to a year or more. Commercials define their own schedules. TV is a churning process. You get your shots in, you get two or three weeks to do them, and boom they’re out, next episode, same thing, next episode, same thing. It’s really fast-paced,” commented Fotter.
Due to aggressive schedules, success can bring its own set of problems. Even renting equipment may not be a feasible solution. When considering how long it takes to order, deliver, and rack-and-stack the nodes; the challenge of finding available rental hardware; finding enough data center space, power, networking, and cooling, it may seem like there’s no answer—unless an organization starts looking at the cloud.
“Before the cloud, I don’t know if there was a solution. Maybe really expensive co-location, or some other crazy scenario, but the cloud started to become a reasonable way for us to get some of our more pressing render jobs done,” said Fotter.
“Files are the medium of exchange between applications that were not necessarily written by the same company. How do you get something from the animation package into the rendering package? Those are two different disciplines, two different areas of focus, so you must create workflows that integrate across applications, and a file is the way to do that.”
Jason Fotter, Co-Founder & CTO
Qumulo enables the file workflow FuseFX was looking for in the public cloud
By late 2016, Fotter knew Bracket Computing was no longer going to be an option and he began looking for alternatives. “I was really focused on price and performance. Who had the features that we were looking for? Who wanted to develop a relationship with us in VFX rendering? I thought our process was really innovative and I wanted someone who felt the same way.”
While he was evaluating his options, Amazon bought Thinkbox, the creators of Deadline, a software that manages rendering pipelines. FuseFX was already running Deadline in the cloud and AWS was looking for just such a customer, so Fotter knew he had found the partner FuseFX was looking for.
One of Fotter’s, and FuseFX’s, goals was to expand the virtual render farm. With the Bracket solution, Fotter was running a single, high-powered Linux instance on AWS, but the storage architecture couldn’t handle more than 200 to 300 virtual machines.
Fotter knew he needed fast, clustered storage if he wanted to run more instances. “We came up with all kinds of ideas. We thought about leveraging S3 and syncing everything to the local machines, but that didn’t fit with the way we work. We talked to Avere multiple times, but they’re very NFS-centric and we’re a Windows shop. Nothing was really hitting the mark for exactly what I was looking for.”
FuseFX already had a Qumulo cluster on-premises. Fotter had spoken with Qumulo about his need for a cloud-based solution, and when he learned that the company was offering its software on AWS, he jumped at the chance to try it out. Qumulo’s Cloud Q on AWS leverages Amazon EC2 & EBS; the; the team experimented with a single instance early on and liked what they saw, so when the four-node cluster became available, Fotter was ready to integrate it into his production workflow.
BENEFITS
- Real-time Visibility. Active monitoring & support.
- Scale Across. File storage in the public cloud.
- Enterprise Proven. Flexible capacity and performance.
FuseFX puts their solution to the ultimate test while rendering The Tick
The Qumulo cluster was put to the test when the company was working on an episode of The Tick. “Our process is that people work during the day, submit their jobs, then we render overnight,” described Fotter. “When they come in the next day, they look at the frames, evaluate where they’re at, and either send it off to the next task, or they might decide they need to re-render something.”
“And again, we only have two to three weeks for a single episode. We often start a project close to the delivery of the first episodes. We don’t have a lot of time to waste. If we have a problem, it’s always a critical problem. We came in one morning and discovered there had been problems overnight. There must have been 50 jobs queued up that hadn’t rendered a single frame. The stress level of the production team was pretty high at that moment. We had been targeting 1,000 machines as a maximum target for capacity. I knew that a moment would come where we would want to burst that high, and it was apparent that now was that time. Each EC2 Spot instance was 32 cores, so that’s 32,000 cores at one time!”
“I told my render wranglers that if they had a frame to render, turn on a node for it. Just get it done. We knew that with Qumulo we would be able to support that kind of throughput. And we did it. We got the frames rendered in the cloud and got them back down on-premise. We were actually rendering so fast that the bottleneck was getting the frames back from our cloud cluster.”
“We saved ourselves. That’s actual proof that the solution works. There’s no possible way I could install 1,000 machines in our network here. I don’t have the power or cooling to support them. We were able to make the decision, and in less than one hour be rendering on 1,000 machines. After the jobs finished, we simply terminated the instances. When I think about how easy it was, it still doesn’t sound real.”
Smart application utilization helps make FuseFX’s infrastructure sing
Besides Qumulo, the FuseFX pipeline uses EC2 Spot Instances for scalable, low-cost computing, Deadline for queue management and managing bids for the spot instances, Thinkbox Marketplace usage-based licensing (UBL) for flexible licensing, and V-Ray for rendering.
“If you exhaust your local licenses, you can purchase per-minute or per-hour licenses of Deadline and V-Ray. Once your local license limit is reached, the software sends those requests to the store, monitors the usage and deducts from that time. It’s like a calling card. You buy a calling card with an hour of calling time on it and every call you make deducts from that.” Everything is coordinated by the on-premise server, which is connected to the cloud instances with a VPN.
Once it’s synchronized to the Qumulo cluster in AWS, rendering can occur both locally and in the cloud at the same time. A local machine can, for example, pick up the first frame, and a cloud node can pick up the second frame. Deadline manages the distribution so that the cloud is simply an extension of the on-premise render farm.
FuseFX is still working on automation. “We use a custom AMI that has some internal automation. For that, we use CloudFormation. It gets itself on the network, mounts the Qumulo storage, sets up the Deadline Slaves, and a few other things. Right now, we start and terminate the Qumulo instances manually,” said Leslie.
“If we have a long-term timeframe where we know we’re not going to use Qumulo, we terminate it and we tell the Qumulo support team. We’ve learned that we should tell them when we’re turning it off because they monitor it so nicely that, otherwise, when we do terminate it, people start calling me to tell me my cloud cluster is down.”
The importance of a well-orchestrated workflow
Fotter has learned quite a bit since FuseFX first began using the cloud. “Getting the workflow right is the biggest challenge. Rendering is complicated, and visual effects is an inherently inefficient process. The more that you can create efficiencies in the workflow, the better off you’re going to be,” he said.
“Solving the data synchronization issue is the hardest part because render jobs require a lot of assets, textures, geometry, simulation caches, and whatever else you need to create the final image. When you’re rendering in the cloud, if you’re missing one little texture and that job renders incorrectly, you’ve wasted all that money. We’ve gone through those pains.
“We’ve learned the hard way, but being committed to the process and knowing that you can create a solution has always been my focus. So, to boil it down, my advice is to test it. Come up with a plan, test it, be committed to it, and really understand your workflow from start to finish.”
“I was really focused on price and performance. Who had the features that we were looking for? Who wanted to develop a relationship with us in VFX rendering? I thought our process was really innovative and I wanted someone who felt the same way.”
Jason Fotter, Co-Founder & CTO
In the end, file, not object, is key to the visual effects process
Fotter also affirmed the importance of file-based data to his workflow. “It would be nice to be able to use object storage, but we don’t have a single product in our environment that uses it. It doesn’t make sense. We’re a file-based workflow. That’s the way the visual effects process works. We have a large amount of files on a file system. We read them. We pull them into our applications. We work on them. We do our creative work and we create more files.”
“Files are the medium of exchange between applications that were not necessarily written by the same company. How do you get something from the animation package into the rendering package? Those are two different disciplines, two different areas of focus, so you must create workflows that integrate across applications, and a file is the way to do that.”
“It follows then, that without a high-performance file system in the cloud, our workflow would be impossible. Qumulo is at the foundation of our AWS storage solution. Without it, we wouldn’t be able to expand to the capacity that we have.”