Managing Data Volumes, Visibility and a Vision for the Future for Genomics Research and Life Sciences Organizations

October 6, 2020

Authored by:

Qumulo Team

It’s an exciting week for biomedical research, drug discovery and development, and healthcare professionals as the Bio-IT World Conference and Expo Virtual kicks off today. We’re excited to hear from attendees about their file data capture, processing, collaboration, and management needs.

Common Data Management Challenges

I expect that data management will be a hot topic this week, as life science and genomics researchers look for data platforms and services that enable increased computing power, as well as solutions that can scale to handle billions of data points and files efficiently.

We understand that researchers and other biomedical professionals are challenged to not only derive meaningful knowledge from the massive volumes of data, but to be able to analyze and deliver the resulting data faster than ever before.

Qumulo’s goal is to help research organizations focus on their science versus their storage.

Below are some of the common data management challenges we hear from our customers, and how Qumulo’s file data platform can help.

Challenge #1: Data Volumes

The Medical Futurist Institute estimates that a single human genome takes up 100 gigabytes of storage space. As more and more genomes are sequenced, storage needs will grow from gigabytes to petabytes to exabytes.

“By 2025, an estimated 40 exabytes of storage capacity will be required for human genomic data,” according to the Institute.

And that data growth isn’t stopping anytime soon.

When you have a large number of files like these, the directory structure and file attributes themselves become big data.

Qumulo’s file data platform is unique in how it approaches the problem of scalability. It is designed to scale to billions of files, and store all file sizes efficiently. The platform’s design implements principles similar to those used by modern, large-scale, distributed databases. The result is a file data platform with unmatched scale characteristics.

Challenges #2: Data Visibility

When you have billions of files in a storage system, you need a way to manage them.

Administrators of legacy file systems can often be hampered by “data blindness,” meaning they can’t get an accurate picture of what’s happening in their file system.

The University of Utah’s Scientific Computing and Imaging (SCI) Institute was all-too familiar with this challenge. The organization was confronted by massive data files – and equally massive processing and capacity challenges.

“When we run out of capacity, the direction from higher up is inevitably ‘just delete old data’,” Nick Rathke, Assistant Director, Information Technology for the SCI Institute, said. “But which old data? There’s a big distinction between data that’s old and data that’s important, and I can’t tell which is which without running lengthy manual reports.”

Given this lack of visibility, Rathke’s team also struggled to work with users on storage management. “I can’t easily tell them how much they’re using, I can’t dispute the importance of a file that hasn’t been touched in years, I can’t track allocations – it’s an extremely painful process.”

Qumulo’s file data platform is designed to give exactly that kind of visibility, no matter how many files and directories there are. You can get immediate insight into throughput trending and hotspots. You can also set real-time capacity quotas, which avoid the time-consuming quota provisioning overhead of legacy storage. Information is accessible through a graphical user interface and there is also a REST API that allows you to access the information programmatically.

Challenge #3: Realizing A Vision for On-Prem, Public Cloud and Multicloud Data Management

Research organizations are increasingly looking to the cloud to give them more compute resources for their analyses.

What’s most interesting is why this is happening in life sciences now, according to Accenture. “In other industries, cost-effective data storage and accelerated time to market are the primary drivers. Life sciences organizations, however, see leveraging expertise and the ability to focus resources on innovation as the top benefits of migrating to the cloud.

“Because it offers companies the flexibility and ability to scale up infrastructure, informatics and analytics capabilities on-demand rather than wait for large traditional IT deployments, the cloud makes it possible for organizations to move from idea, to experimentation, to large-scale deployment with unprecedented speed.”

Qumulo’s unique, software-defined approach that allows our file system to run on both on-prem and in the cloud. Qumulo runs on Hewlett-Packard Enterprise (HPE) Apollo Gen10 servers, Fujitsu, and in the cloud on Amazon Web Services (AWS) and Google Cloud Platform (GCP).

Public cloud platforms such as AWS or GCP offer life sciences and research organizations flexibility. The inherent ‘elasticity’ of cloud resources enables organizations to scale their computational resources in relation to the amount of data that they need to analyze.

Learn more

Watch this free,on-demand webinar with me, Adam Kraut of BioTeam, Inc., and Emric Delton of ARUP Laboratories, for industry trends and tips for accelerating genomics research: “Accelerating Genomic Research with Hybrid Cloud Solutions.”

Registered attendees of Bio-IT World can visit Qumulo’s booth for interactive Zoom discussions, private demos, or to download case studies, white papers and other materials. Email Qumulo’s representatives at the show Brian Conway (bconway@qumulo.com) or Matt Boutin (mboutin@qumulo.com) to set up a meeting, or tweet us @Qumulo – we’d love to speak with you!

Don’t forget to subscribe to our blog!

How much AI infrastructure can you get for $400?

This blog explains the trade offs cloud architects were required to build around using traditional file systems when constructing AI

Boost AVD Performance and Cut Costs with ANQ and Nerdio

As organizations increasingly move towards cloud-based solutions, managing user profiles and storage efficiently becomes paramount. One of the challenges faced

Managing Data Volumes, Visibility and a Vision for the Future for Genomics Research and Life Sciences Organizations

Authored by:

Common Data Management Challenges

Challenge #1: Data Volumes

Challenges #2: Data Visibility

Challenge #3: Realizing A Vision for On-Prem, Public Cloud and Multicloud Data Management

Learn more

Related Posts

How much AI infrastructure can you get for $400?

Boost AVD Performance and Cut Costs with ANQ and Nerdio

Products

Use Cases

Industries

Partners

Get Started

Follow Us

Company

Qumulo Trust

Our Biggest Release