Four Considerations When Evaluating File Systems For Your Life Sciences Research Computing Environment

Authored by:

The research computing market is more critical now than ever before. For example, drug therapy research is in high demand due to the COVID pandemic, and genomics research is rapidly improving and bringing new treatments and therapies to market faster.

This innovation is due to the ability of life science organizations to process massive amounts of data, while also leveraging technologies like artificial intelligence (AI) and machine learning (ML). Research computing workloads like genomic sequencing, data analysis and research imaging, are incredibly dependent upon their data management platform.

Organizations are spending tens of millions of dollars on systems and platforms to capture, process, and store many types of data (e.g., experimental, operational, clinical) from many disparate sources. Additionally, instruments create complex data from a huge range of devices – genomic sequencers, 3D microscopes, patient imaging systems – that are streaming massive amounts of data to centralized systems for analysis. With the continuous evolution of AI, ML, and 3D imaging technologies, the size and amount of data that life science organizations have to manage will continue to scale far beyond petabytes.

As a result, many organizations are assessing modern architectures to consolidate, process and leverage this data.

File data platforms have the ability to process huge amounts of data for research computing, often consisting of billions of files, to extrapolate key research information. IoT (Internet of Things) innovations are helping to capture these data points at record levels.

Qumulo’s file data platform 

According to Bio-IT World, “With the increased demand in computing power from life science researchers and scientists tackling big data issues, storage and infrastructure must be able to scale to handle billions of data points and files efficiently.”

When evaluating a file data platform for your research computing workloads, you should consider the following:

  • Does my file data platform deliver small file performance as efficiently as large streaming files? This removes the “block” size limitations applied with other file systems. To improve performance, many file systems use larger block sizes, which is fine for large files, but very inefficient for small files, as each block can only contain one file. This potentially leads to huge capacity wastage.

“One of our major replacement criteria was finding a storage system that could bridge that file volume and variety,” says Bill Kupiec, IT Manager for Carnegie’s Department of Embryology. “It had to handle both the streaming needed for very large data sets and the fast processing required for millions of small files. That made locating a workable solution extremely challenging.

“Our research organization falls between the cracks for most storage vendors, with giant imaging sets and millions of tiny genetic sequencing scraps. Finding a system that reasonably handled all our complex workflows was difficult, and in the end only Qumulo was the right fit.”

  • Can my organization seamlessly scale workloads, as and when needed, to cloud environments? Due to the growing size of data sets and the compute-intensive nature of AI and ML, organizations are taking advantage of the cloud’s flexibility and resources. The public cloud provide for larger amounts of compute performance processing and access to GPUs, enhanced collaboration, and access to cloud-native AI and ML applications.
  • Does my file system ensure high availability of my data? Qumulo’s file system offers enterprise-level data protection using erasure coding. Data is efficiently distributed across multiple nodes to protect against drive failures. In the event of a drive failure, unlike traditional RAID solutions, performance is not affected during rebuilds. Erasure coding also requires less capacity (typically 33% less space) for resilience than RAID.
  • Does my organization suffer from “data blindness?” Qumulo’s real-time analytics provide visibility and insight across billions of files. Organizations gain control, with information about the entire file data platform, enabling them to predict usage and capacity trends, optimize workflows, and more proactively manage current and future storage requirements.

Learn more

Qumulo has several helpful resources for learning more about research computing and how our file data platform meets the performance and capacity demands for life sciences organizations, in the data center and in the cloud.

Stop by our virtual booth at Bio-IT World this week – we’d love to speak with you! Also, watch this free,on-demand webinar with me, Adam Kraut of BioTeam, Inc., and Emric Delton of ARUP Laboratories, for some best practices on accelerating genomics research: “Accelerating Genomic Research with Hybrid Cloud Solutions.”

Contact us here if you’d like to set up a meeting or request a demo. And subscribe to our blog for more helpful best practices and resources!

0 0 votes
Article Rating
Subscribe
Notify me about
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Related Posts

0
Would love your thoughts, please comment.x
()
x
Scroll to Top