The San Diego Supercomputer Center Supersizes Advanced Storage With Qumulo
The global scientific research community spans industries, individuals and specialties. However, it does have one thing in common: the need for massive computing and data storage resources.
Only a few research organizations can afford their very own supercomputer and advanced storage systems. Many turn to specialized Managed Service Providers (MSPs) to offer remote computing and storage capacity to data-intensive research clients.
The San Diego Supercomputer Center Leads the Charge
The San Diego Supercomputer Center, or SDSC, is a leading MSP for the scientific community in government, academia, and business.
SDSC is a member of XSEDE (eXtreme Science and Engineering Discovery Environment), a single virtual system that enables researchers to interactively share computing resources, data collections, and advanced research tools.
As a research unit at the University of California,San Diego, SDSC uses its on-prem supercomputers to run advanced computation and all aspects of big data storage and analysis, including data integration, performance modeling, data mining, and predictive analytics.
SDSC works with its clients to customize supercomputer and storage system resources for extreme data projects, including astrophysics visualization for the American Museum of Natural History, large-scale simulations of The Big One in southern California, and sophisticated flu season modeling for the Centers for Disease Control.
Two of SDSC’s important projects serve fast-growing neuroscience research community – The Center’s Neuroscience Gateway (NSG), funded by the National Science Foundation (NSF) and the National Institute of Health (NIH), which is a collaboration between the Center, Yale University, and University College London. NSGportal lets neuroscience researchers access large scale computing for modeling and data processing which requires managing of large neuroscience data stored on its data-intensive storage systems.
Another neuroscience offering under development is NIH funded NEMAR (human NeuroElectroMagnetic data Archive and tools Resources) gateway. The gateway is developing open access to archived EEG (electroencephalography) and MEG (magnetoencephalography) data for neuroscientists and large scale data storage and management are key parts of the project.
Client Demands Might Outstrip Super Resources
SDSC faced a challenge regarding its storage infrastructure. These data-intensive gateways and client technology stacks must support high-performance and high-capacity data storage for massive amounts of big data – much of it unstructured. Although the Center’s supercomputers easily handle computing tasks, the neuroscience storage systems lacked massive scale-out capacity and the storage features necessary to support big data, fast access, and advanced analytics.
“Our storage requirements for the NSG and EEG/MEG data projects are growing from tens of terabytes to hundreds of terabytes,” said Amit Majumdar, Ph.D., Director of Data Enabled Scientific Computing at SDSC. “Large data transfer and storage, high-speed access, sharing, search functionalities — all of these are becoming more and more important for our projects.”
To successfully meet its client requirements, SDSC needed a storage solution that would provide an optimal balance of performance, capacity, scalability, durability, and advanced functionality, all at a reasonable cost.
“At SDSC, delivering critical analysis and results is paramount, yet high-performance computing workloads are incredibly dependent upon their storage system. As an organization, we are moving towards integration of cloud for both compute and storage, as a part of our science gateways. As a result, it’s important for us to make leading cloud technologies available via our Research Data Services division,” added Majumdar.
Partnering with Qumulo
The impetus for the Center’s desire for a new kind of storage provider was a set of new clients who needed over 1 PB in storage capacity. SDSC was concerned about the performance, reliability, and management of their existing storage solutions at that scale.
Brian Balderston, SDSC’s Director of Infrastructure, decided there must be a better way. He tested several high-performance storage systems and decided on Qumulo’s hybrid cloud file storage as a frontrunner in data-intensive computing and storage infrastructure for the national research community.
“I believed that we could build a better storage system for our client that didn’t need quite as much operational care and feeding. So, I reached out to the Qumulo team with our requirements,” said Balderston. “Their distributed scale-out NAS file system met our capacity, performance, data integrity, and scale-out requirements at an acceptable price for our client.”
Qumulo’s file storage differed from the existing infrastructure at SDSC and that used by its client organizations. Most of the Center’s academic clients were accustomed to open-source, parallel file systems for research data workloads. Qumulo’s proprietary software stack and distributed file system were a new kind of storage, and quickly proved to be more advanced and capable of managing massive scientific research workloads, now and in the future.
Qumulo scales unstructured data more efficiently than parallel filesystems, making it ideal for environments with massive file counts, directory structures, and billions of small files. The scale-out NAS file system supports fast ingest and access and is highly searchable. High availability and minimal rebuild times keep data safe and always available – with no data loss.
SDSC’s capital costs for Qumulo were in line with its budget, and its operational costs proved lower than expected. “With Qumulo, we realized much lower operational expenses than we’ve experienced with other storage solutions,” noted Balderston. “Plus, we’ve doubled the size of our cluster and will likely double it again soon.” SDSC passed the savings onto its MSP clients, which makes its hosting platform even more attractive.
Massive Scaling, High Performance
Today, Qumulo provides SDSC and its clients persistent storage for high capacity /high-performance workloads. Key infrastructure components include virtual machines (VMs), Qumulo storage mounted on a supercomputer, and high bandwidth networks. SDSC is moving towards integrating on-prem and cloud storage to serve its science gateways. Since Qumulo’s file storage is cloud-native, it seamlessly supports on-prem and cloud integration.
Qumulo optimizes its unique software for fast reads and write-throughs. The accelerated architecture delivers extremely low latency, and high IOPS and throughput performance. Predictive caching and prefetch proactively identify IO patterns and efficiently move data to the fastest media.
Qumulo is also simple to deploy, manage, and access – critical components for both SDSC and its clients. “Qumulo has been incredibly easy for SDSC to manage,” said Balderston.
“Instead of focusing our staff and resources on managing a number of inefficient storage systems, we use our engineering time to work on highly impactful and well-funded grants from the National Science Foundation, the National Institute of Health, and other funding agencies. That is a big win for all of us.”
Qumulo proved that it is a different kind of storage company – a company that built its storage for the modern age. Some legacy storage systems still work for structured data in well-defined traditional storage environments. But these products were never designed for today’s massive data growth, unstructured data types, intensive scientific workloads, and complex applications.
To meet and exceed these new storage requirements, Qumulo designed its software using the principles behind modern, large-scale, distributed databases. The result is a unique file system with unmatched performance and scalability.
Client adoption proves the point at SDSC. “Probably my biggest achievement is standing this storage system up and then getting massive adoption,” Balderston said. “Since the initial proof-of-concept, SDSC has reached a new set of customers, including more than two dozen University of California research labs and departments. I can’t think of any other service that has been adopted this quickly.”
The San Diego Supercomputer Center, or SDSC, is a leading MSP for the scientific community in government, academia, and business. As a research unit at the University of California,San Diego, SDSC uses its on-prem supercomputers to run advanced computation and all aspects of big data storage and analysis, including data integration, performance modeling, data mining, and predictive analytics.
- Effectively store and manage massive unstructured file stores
- Support large and growing scientific research workloads
- Provide high performance data ingest and access to multiple global clients
- High performance
- High availability and durability
- Ease of deployment, management and access
- Easily scale from TB to PB