It’s 2005. Bowling averages are way up, mini-golf scores are way down. The Foo Fighters and Nickelback both have two top 20 hits. It’s two years before the first terabyte drives are shipped so if you want to upgrade your storage, you’re looking at a lot of heavy lifting.
In customer support, we’d refer to these as fork-lift upgrades. The implication is that you are looking to upgrade, but you can’t mix platforms. Given this, you call up your favorite sales guy, get him to airlift you some new storage, migrate everything over, and bingo boom! Then you call your buddy with a pallet jack and get the old stuff out of the data center.
It works. It’s a real pain, but it works.
Flash forward to 2023. Nickelback is nothing but a photograph, and we’ve heard the best of the Foo Fighters. And here at Qumulo, we saw something that worked and said ‘we can do better’.
Enter Transparent Platform Refresh.
Transparent Platform Refresh (TPR or node replace) allows you to refresh hardware in place without migration. At a high level, you rack in new nodes and kick off a job that moves all the data from the old nodes to the new nodes, and then evicts the old nodes. Gone are the days of kicking off multiple replication jobs. Gone are the days of cutover windows. Now it just “happens”.
Now that sounds great from the perspective of moving to a new platform, but what are some other benefits? Let’s run through a quick example:
Say we have an older QC208 platform. We’re going to call it ‘Airport’ since it’s going to fly away. When we started, it was a 4 node cluster and it has grown to 8 nodes over time. A rundown of what our current situation looks like –
Nodes – 8
Capacity – 1086TB
Efficiency – 65.3%
If we TPR this to a C-432 platform, our numbers change to –
Nodes – 5
Capacity – 1511.1TB
Efficiency – 70%
On top of that, our network port usage goes from 16 ports to 10, our total weight drops by about 800 pounds, and our watts and BTU drop by 3600 and 12000 respectively.
This all takes place while you’re still serving data.
Once complete, TPR will turn off the old nodes, and then it’s just a matter of unracking them and using them for a coffee table.
This sounds great, but let’s talk brass tacks. What are the support cats seeing?
Well to be blunt, it is as great as it sounds. Qumulo utilized the existing restriper engine to handle the background details, so we’re not stuck fighting some new sparkly code written in a Jolt Cola fueled hack-a-thon. Quite simply – it just works.
With the release of Qumulo Core 6.1.2, TPR is placed front and center in the command line interface (qq replace_nodes) and REST API (/cluster/nodes). As a support person, I will still recommend that you reach out in your Slack channel before just going full cowboy on it (there are probably some caveats and corner cases we might want to discuss). However by and large it’s fair game. Go forth and refresh!
And yes, we still have more excellent water slides than any other planet we communicate with.
Until next month!