Qumulo Distributed File Storage: Purpose-Built to Meet the Demands of High Performance Computing (HPC) Environments

Authored by:
In an environment where delivering critical analysis and results is paramount, HPC workloads are incredibly dependent upon their storage system.

The High-Performance Computing (HPC) market is perhaps the fastest growth sector in the IT world today. Hyperion Research forecasts the total HPC market to reach $44B by 2023, with storage-related investments accounting for $7.8B of that.

Three key points from Hyperion’s research are:

  1. Storage systems will increasingly become more critical;
  2. Cloud computing for HPC workloads will grow fast;
  3. Artificial Intelligence (AI) will grow faster than everything else.

Artificial Intelligence (AI) is growing fast

As mentioned above, AI is growing fast. Based on a report from Grand View Research, the Artificial Intelligence market will grow at a CAGR of 57.2% from 2017 to 2025 (reaching $35.8B). The analyst firm says:

”Artificial Intelligence (AI) is considered to be the next stupendous technological development, alike past developments such as the revolution of industries, the computer era, and the emergence of smartphone technology.” Grand View also notes that North America, in particular, is expected to dominate AI deployments due to “the availability of high government funding, the presence of leading players and strong technology base.”

The HPC market covers many research areas, including life sciences and medicine, climate research, earthquake detection and tectonics, automated driving, physics and astrophysics, agriculture, and energy.

The best enterprise data storage solutions for unstructured data are distributed file systems that have the ability to process huge amounts of data for HPC workloads, often consisting of hundreds of millions or billions of small-sized data points, to extrapolate key research information. Furthermore, Internet of Things (IoT) innovations are helping to capture these data points at record levels. Sensors in use today span a huge range of electronic devices — from autonomous vehicles and smart cities, to industrial manufacturing and supply chain — are streaming massive amounts of data in real-time to centralized systems for analysis.

With the continuous growth of technologies like AI, machine learning (ML), and 3D imaging, the size and amount of data that organizations have to manage and store, will continue to explode to petabyte-scale levels (and beyond).

When developing its next generation scale-out NAS, Qumulo had several key objectives in mind to meet HPC-specific requirements

Storage systems are becoming more critical as organizations need to keep pace with rapid data growth and expansion, by scaling easily, without disruption, when and where needed. Qumulo’s distributed file system — Qumulo Core — leverages a node-based architecture, which allows organizations to scale both performance and capacity — and across on-premises and cloud environments. Qumulo’s software-defined storage architecture utilizes clusters of nodes made up of Qumulo hardware or pre-qualified industry-standard hardware from HPE or Fujitsu, providing several platform options, including all-NVMe, active archive and cloud, and more.

In an HPC environment where delivering critical analysis and results is paramount, HPC workloads are incredibly dependent upon their storage system

To meet the performance requirements of HPC workloads, Qumulo’s all-NVMe platform delivers industry-leading IOPS with extremely low latency, which is particularly valuable in HPC environments. Latency is equally important as IOPs for HPC environments, so that the system can process large numbers of transactions quickly.

Qumulo’s file system was designed to handle small files as efficiently as large files. This removes the “Block” size limitations applied with other scale-out NAS solutions. To improve performance, many file systems use larger block sizes, which is fine for large files, but very inefficient for small files, as each block can only contain one file. This potentially leads to huge capacity wastage…not a problem for Qumulo!

As stated above, options for cloud computing are becoming increasingly important. Due to the growing size of data sets and the compute-intensive nature of AI and ML, organizations are taking advantage of the cloud. By using Qumulo’s single platform, organizations can seamlessly scale workloads, as and when needed, to AWS or GCP cloud environments, for compute performance processing, enhanced collaboration, and data storage.

The importance of data protection in HPC environments

Data protection and availability are also important in this industry. Qumulo offers industry-leading data protection using erasure coding. Data is efficiently distributed across multiple nodes to protect against drive failures. In the event of a drive failure, unlike traditional RAID solutions, performance is not affected during rebuilds. Erasure coding also requires less capacity (typically 33% less space) for resilience than RAID.

In addition, Qumulo’s software includes real-time analytics to eliminate data blindness, providing instant visibility across billions of files. With this valuable technology, organizations gain control, with real-time information about the entire storage system, enabling them to predict usage and capacity trends, and more proactively manage current and future storage requirements.

Qumulo’s distributed file system software has been designed from the ground-up to meet today’s requirements for scale, offering the highest-performance file storage system for data centers and the public cloud

Data has changed. So why continue to use storage technology that was designed twenty years ago, when modern file storage technology exists today that is ideally designed for the demanding requirements of HPC? Qumulo is the only file-based, scale-out NAS designed to offer robust capabilities on-premises and in the cloud — and the radically simple way to manage unstructured data in any environment.

Current Qumulo customers in the HPC industry, many of which operate in the life sciences and scientific research sectors, include: National Renewable Energy Laboratory (NREL), Carnegie Institution for Science, CID Research, Channing Division of Network Medicine at Brigham and Women’s Hospital, DarwinHealth, Inc., Georgia Cancer Center at Augusta University, Institute for Health Metrics and Evaluation (IHME) at the University of Washington, Johns Hopkins Genomics at The Johns Hopkins University, Progenity, Inc., UConn Health, University of Utah Scientific Computing and Imaging (SCI) Institute…and many others that are using Qumulo to accelerate their data-intensive workflows and speed discovery of new scientific and medical breakthroughs.

Related Posts

Scroll to Top