AI/ML Everywhere with Scale Anywhere™

September 19, 2023

Authored by:

Kiran Bhagheshpur

You’re knee-deep in planning your next AI/ML solution. Have you made up your mind on all your storage-related decisions? On-premises versus cloud? Object versus file? Where will you run your AI/ML solutions?

By the way, before I start, I have to get something off my chest. I hate the term artificial intelligence. Seriously, AI is less intelligent than a doorknob! It’s just math, stats with simply ginormous volumes of data executed, on an hourly basis, on more processing units than existed in the entire world just a few decades ago.

Here is a great article about what is REALLY happening behind the scenes at ChatGPT. I hate the term AI so much I’m going to use ML for this post.

Okay, off my soapbox. Let’s talk about why storage is so important for AI – I mean ML …

ML is all about data. You just have to find it!

The data that powers ML workloads are everywhere. So is the need for ML solutions. Some examples (all derived from our customers’ real-world workflows):

Factories. Cameras placed everywhere capture all manner of information in real-time automated manufacturing operations. All of this is aggregated continuously on local storage.

Data from every manufacturing line are centralized (cloud or on-premises) and become the source for training models to automate failure detection. ML solutions at each factory, then the resulting Inference models to improve yields.

By the way, we have customers leveraging Qumulo in this exact workflow!

Self-Driving Vehicles. High-def multi-spectrum video of the real world is continuously captured, initially by hundreds of test vehicles, then by thousands of experimental vehicles, and eventually by millions of regular vehicles.

The car sends these videos back to the mothership via your Wi-Fi (Tesla, I’m looking at you!). The above is the source data for training the models to assist drivers, enable autopilot, and make self-driving cars a reality.

Inference models derived from the above are executed by inference engines on the cars to process real-time data.

And yeah, we have customers using Qumulo in this workflow too!

ML Assisted Security: Hundreds of thousands of network devices generate activity logs that are consolidated locally and aggregated (either in the cloud, on-premises, or both). This forms the training data set for models to detect unauthorized network intrusions.

Modern network devices use these inference models to analyze real-time observed events, looking to spot unauthorized intrusions.

You guessed it, we have customers do this today!

Can you spot the common thread? This is an “everywhere” workflow, edge of the network, to the core and in public clouds.

Choices, Choices, Choices

What about the ML solution you are working on? As these use cases did, you’ll have to make a series of important choices about where and how you build your models. Let me explain …

Edge, Core, or Cloud? Where will your data live? Where will the model live? Where will the solution live? The cloud folks insist it is (and always will be in) the cloud. And, yes, they are biased, but they also make a good point.

After all, what organization can possibly keep up with the infrastructure needed to operationalize training LLMs when pretty much the entire stack is changing on a weekly basis? I’ve spoken to many organizations who say ML was the last straw regarding their internal cloud versus on-premises debate.

But before we say “game over,” here’s an interesting trend. I talk to lots of customers who spend hundreds of millions of dollars each quarter on the cloud and who are repatriating their steady-state workloads back to on-premises. Why? They feel that for a mature ML solution, on-premises provides a more stable and cost-optimized solution.

In other words, rapid experimentation and early operationalization live in the cloud, whereas mature ML solutions enjoy more stability and better economics in owned and operated data centers. But that’s just one choice …

Object vs. File? Before you answer, consider this. Cloud handles object storage extremely well yet sucks at file data. And on-premises handles file data really well but sucks at object. And we just discussed you’ll likely need both cloud and on-premises. Which is better for ML? Well … it’s complicated.

On one hand, most LLMs are open source and expect to access data across a local storage interface. That’s a problem for Cloud, where you must create bespoke data loaders copying data from object to local disk (instance attached NVMe or EBS / Managed Disks) before those data-hungry GPUs can be fed. Look what Google GCP says about this:

“But when it comes time for an AI workload to actually access [AI] data, it isn’t always straightforward, since most AI workloads require file system semantics, rather than the object semantics that Cloud Storage provides.”

What to Do?

So many questions! On-prem or cloud for your AI / ML workloads? File or Object as the repository of the data that powers your LLM’s? How will you aggregate data from various locations and manage it across the entire life cycle? The decisions you make impact which storage solution is best for you – Dell, VAST, NetApp.

Or … do they?

Introducing Scale Anywhere™ from Qumulo, a 100% software-based unstructured data storage and management solution. Need a core data center solution? Public cloud? Check. File? Check. Object? Check. Make whatever ML decisions you want – we simply don’t care. Qumulo runs wherever you need us to.

We have customers who span multiple storage server hardware platforms on-premises and multiple public clouds. The beauty is that their Qumulo-based unstructured data workloads are unified across these.

I am in front of customers all the time, and I get a lot of positive feedback on this. That’s a new thing for me compared to other companies I’ve worked at, but I got used to it quickly!

Take ML Choices in Stride with Qumulo’s Scale Anywhere™.

Implementing ML requires a lot of tough decisions. But which storage platform to use isn’t one of them. Try Scale Anywhere™ with Qumulo, and you can manage all your ML workloads regardless of the decisions you make …

Edge, Core, or Cloud
File or Object
For training data collection, aggregation, and curation
Or … to enable you to push your inference models out to your distributed edge

Qumulo is your easiest ML choice.

How much AI infrastructure can you get for $400?

This blog explains the trade offs cloud architects were required to build around using traditional file systems when constructing AI

Boost AVD Performance and Cut Costs with ANQ and Nerdio

As organizations increasingly move towards cloud-based solutions, managing user profiles and storage efficiently becomes paramount. One of the challenges faced

AI/ML Everywhere with Scale Anywhere™

ML is all about data. You just have to find it!

Choices, Choices, Choices

What to Do?

Take ML Choices in Stride with Qumulo’s Scale Anywhere™.

Related Posts

How much AI infrastructure can you get for $400?

Boost AVD Performance and Cut Costs with ANQ and Nerdio

Products

Use Cases

Industries

Partners

Get Started

Follow Us

Company

Qumulo Trust

Our Biggest Release