High performance storage for big data and analytics
Enable enterprise scale for your analytics workloads.
Big data storage is a growing concern for many companies
The ability to make informed decisions from large datasets is critical for today’s enterprises. The intelligence that companies gain from data analytics fuels their growth and ability to compete.
For example, online advertisers rely on data analytics to optimize ad yield and predict buyer behavior. Social media platforms use it to gain insight into what’s important to their users. Logistics companies analyze vast amounts of data from sensors and devices (IoT) to lower costs and speed delivery. Data analytics is central to the development of autonomous vehicle technologies.
Data sources for analysis include mobile phones, sensors and wearable devices, as well as applications and infrastructure in the data center and the cloud.
Adequate storage is a pressing problem for data analytics of all kinds.
- How should storage be attached to the compute resources to ensure high availability of data with low latency and horizontal scalability?
- What are the requirements for a file storage system to serve these demanding workloads?
- What are the best strategies for scaling storage over time?
Storage demands of data analytics
Data analytics can generate insights from massive data sets or data streams with a variety of workflows. Two of these workflows are batch (big data) analytics and streaming analytics.
Whether batch or streaming, data analytics demands great performance from the file storage system. One solution has been to directly attach the compute resources to the storage resources. Direct attached storage creates data silos and is difficult to manage and scale efficiently, but the idea that proximity would ensure performance drove its popularity. Direct-attached storage for data analytics arose from the assumptions that disk bandwidths exceed network bandwidths and that disk I/O constitutes a considerable fraction of a task’s lifetime.
With increased networking speeds and more computationally complex analytic techniques, these assumptions no longer hold. Highly scalable network-attached storage can now outperform direct-attached storage. In addition, storage accessed via a network is cost competitive and won’t create data silos. Today, a more effective strategy for data analytics workflows, such as those that use Apache Spark or Spark Streaming, is to scale compute and storage separately with high-performance network-attached storage.
Video Case Study
Learn how researchers at the Scientific Computing and Imaging Institute (SCI) at the University of Utah use Qumulo to cut their processing time from months to days.
Qumulo for big data storage and analytics
Qumulo’s software is a modern file storage system that has the performance, scalability and enterprise features required by data analytic workloads. Qumulo runs on standard hardware on premises and as EC2 instances on AWS.
Get your results faster
Qumulo has better sustained read throughput than direct-attached storage for analytic workloads. The performance edge of Qumulo comes from its hybrid SSD/HDD architecture and its advanced distributed file system technology.
Buy only the storage you need
With Qumulo, customers have control over how much storage they buy and can avoid overprovisioning. With Qumulo, you save money by buying only the storage you need, regardless of how your compute cluster grows.
Eliminate data silos
Solve storage problems in real time
Qumulo lets administrators find and solve problems in real time. It’s easy to manage your projects and users when you have insight into how the storage is being used.
Run in the cloud and on premises
Continuous replication means you can easily transfer data from your on-premise Qumulo cluster to your Qumulo cluster in AWS, perform their computations, and then transfer the results back to the on-premise storage.
Data analytics workflow
Here is an example of a streaming data analytics workflow that shows Qumulo as the central, storage for the entire process, from ingesting the data to displaying it and acting on it.
Input can come from devices, such as cell phones, scientific instruments, autonomous vehicles and serial devices. It can also come from applications, which typically store their data in Qumulo’s file system and then send a link to the event data flow software packages. The compute resources process the data and both store and retrieve files from Qumulo. Finally, the results are delivered and either displayed as information on a dashboard or used to trigger a particular action, such as a security alert.
“Managing data with Qumulo is so simple it’s hard to describe the impact. It has given us tremendous ROI in terms of time saved and problems eliminated, and having that reliable storage we can finally trust makes us eager to use it more broadly throughout the company.”
John Beck — IT Manager Hyundai MOBIS