This is part three of a five-part blog series that will take a closer look at our new suite of data services that will help our customers to radically simplify file data management at scale. We’ve talked about NVMe Cached Performance and Qumulo Dynamic Scale in previous posts. Here, we provide an overview of Qumulo Instant Upgrade. Future blogs in this series will go more deeply into the other new data services included in this announcement.
Qumulo Instant Upgrade automates software upgrades, making it simple for you to gain access to the latest features, security and performance enhancements
IT administrators have historically dreaded the process of planning for and performing upgrades, not to mention putting rollback plans in place in the event of a failure. Yet upgrading is necessary to take advantage of new features, new performance enhancements, and to maintain system security against the latest cybersecurity threats.
The Problem: A Choice Between Disruption and Time
Upgrade strategies for scale-out infrastructure systems historically have fallen into two camps:
- Rolling upgrade – a single node is taken offline, upgraded, brought back online, then the upgrade moves on to the next node. The total time to upgrade a system grows linearly with the size of the system. Rolling upgrades can take hours, if not days, to complete in very large clusters. And, if there is a failure in the upgrade of some of the nodes in the process, a rollback plan is needed to go back to the original configuration. This is extremely time-consuming and very risky to system integrity and staying within a maintenance window if a rollback is needed.
- Simultaneous upgrade – all the storage nodes are upgraded at the same time.
This typically requires system downtime and application owners need to plan to pause their applications and then bring them back online after the system is back online. If there is a failure in the upgrade of any node in the process, application downtime is extended as administrators work through the cumbersome process of rolling back the upgrade on other nodes, then performing checks that all is successfully back in the original state and operating properly.
A Better Approach: Instant Upgrade
Qumulo believes that you should spend your weekends doing the things you enjoy. For the past eight years, Qumulo has been on a journey to simplify unstructured data management. At the beginning of that journey we knew that making upgrades fast and easy would be a critical promise to make to our users, so we chose a software architecture in which Qumulo Core runs in user space above Linux.
This approach gives us the flexibility of an application; however, taking advantage of that flexibility required implementing a fundamental innovation in our product: containerization. Over the past six months we re-imagined the way we package our file system and the 25+ services we rely on into a single runtime container. With this change, our customers can now upgrade four nodes or 100 nodes with the same single-button upgrade process and upgrade their entire cluster in only 20 seconds of perceived outage to end users.
In designing our Instant Upgrade solution, we leverage the unique advantages of Qumulo being a software-defined, fully containerized file system. We package all of the Qumulo software and data services into a single software-defined container that is resident on each node within a production cluster. When the Instant Upgrade begins, a second container with the newer version of Qumulo Core is created and brought online in parallel with the older software version remaining online and in the production. Once the new version is running and validated, active processes from the older version of Qumulo Core are seamlessly moved to the new version running in the new container (see Figure 1). The old container is later removed.
Instant Upgrade to Qumulo Core is:
- Simple – Initiate the upgrade of any size cluster with the push of a single button
- Fast – Upgrade of any size cluster completes with only 20 seconds offline
- Reliable – No need to plan for application downtime, no noticeable impact to performance
When the underlying host operating system or drivers need to be upgraded, Instant Upgrade also automates this process and will initiate a reboot automatically.
Figure 1: Qumulo Instant Upgrade
Why is Qumulo Uniquely Able to Offer the Instant Upgrade Experience?
Instant Upgrade is possible because Qumulo runs as a “user space” application above Linux. Other file systems run in “kernel space,” having made deep customizations and built dependencies on specific operating system kernels. Because Qumulo is a user space application, we are able to containerize it, which in turn enables us to update the active container from one version to another while leaving the operating system and the kernel running.
Other file systems typically require a separate upgrade process for different components of the file system, metadata servers, NAS gateways, data analytics and UI, and underlying storage servers. These various components often are on separate release schedules and need to be upgraded at different times through the year.
Innovating on Behalf of Our Customers
This entire process was no simple engineering feat, but one with real payoff to users looking to spend their time on strategic work, or time with their family and friends, instead of on tedious system administration tasks. Now, the entire cluster, operating system, drivers, and services are all upgraded with the push of a button, during the standard business day.
While we began this software project earlier this year, we really began this journey eight years ago when Neal Fachan and the rest of our founding team imagined a software-first future and made architectural decisions to enable that reality. No customer asked us to “run in user space,” but we knew that it would be critical to unlocking future innovation. With the release of Instant Upgrade we made good on those bets.