By David Bailey
We’re going to talk today about the machine data multiplier, or the third age of data. But first, a history lesson in the explosion of enterprise data over the past few decades.
The early days: Mid-to-late 1990s
When you think about storage systems and what they were built for back in the mid- to late-1990s, they were really made to process business data. Systems were primarily focused to serve as databases, or to handle online transaction processing. Back then, remember that storage systems were very large, refrigerator-sized behemoths, and supported one or two little mainframe systems (and sometimes very large mainframe systems, too). But these were also the days when, if a database was 200 GB or even a TB, it was considered a very large database and a very large amount of storage to manage.
Also, these early systems might only have had upwards of 12 controllers to attach systems to. So admins weren’t managing a ton of servers that were generating the data, because you didn’t have a lot of connections to the servers to manage.
And so, in the late 1990s, you start to see the invention of SANs, or storage area networks, where you had more and more systems that you wanted to connect to the same storage system, whether it was for email servers or other things. The advent of SANs really provided better improvement in the number of total connections you could make to those storage systems, whether from a throughput perspective, or just because of the sheer volume of servers that were coming out.
The emergence of “human enterprise content” – emails, spreadsheets and digital photos
This led to the next generation wave of data being generated. We classify this as “human enterprise content” that’s coming a lot of external sources. This is unstructured file data generated from individual users – from sources like emails, Word documents, Excel spreadsheets, and so on. Some of the newer social networks began to launch – the MySpaces of the world, and others. Companies like NetApp and Isilon started to gain a big foothold in the market.
The other major data source at this time (creating 10x growth!), came from the rise of digital photography. Digital cameras started to come out in mass production during the late 1990s and early 2000s, and consumers realized they needed a place to store their photos, since early digital cameras didn’t come with a lot of storage. Users also wanted to be able to print those photos. This led to the rise of companies like Ofoto and Shutterfly that come into fruition so that consumers could upload their digital photos in their systems.
Finally, in the early 2000s, there were companies like RealNetworks which at the time streamed baseball and NFL games and other types of video content. Sure, it was small, postage stamp-sized video content back then, but it was just the beginning of what you see now in terms of online video content as more companies began putting live, real-time data up into the networks for users to view.
Entering the age of machine-generated data
And this brings us to today. We still see the rise of unstructured file data, but much of it now is machine-generated. Examples of this include satellite imagery, or data from sensors that gather info about cars that are going toll bridges and toll roads. There are also vast amounts of data being generated in regard to web logs – logs coming off of file servers, logs coming off of switches to see who’s using these resources from an auditing perspective.
Another source of machine-generated data – in some cases, petabytes of data, comes from the bioinformatics side of the house, where things such as genetic analyzers conduct analytics for cancer studies, etc. These data volumes are increasing 100x!
And this is where Qumulo has positioned itself perfectly, in order to handle those huge volumes, so that our customers can see how their data is growing, understand how it’s being used, while also making it extremely easy to manage that data going forward.
One of the problems that we see with machine-generated data and managing these vast amounts of information – whether it’s 100s of terabytes or petabytes in size – is data blindness. This is caused when vast amounts of data are being generated very quickly from lots of different machine data sources. Being able to understand how that data grows, and what’s actually in that data is paramount for how you manage it. For many enterprises, it’s not uncommon to see 5-10 TB of data growing each day in these storage systems. Understanding that scale and how to manage how that data grows is very important.
The other thing we have to look at with machine-generated data is the deployment methodologies or influx components. With the data be stored on-prem, in the cloud or in a combination of both? Do you have to manage data across different platforms or vendor machines?
Unleash the power of your file-based data with Qumulo
Qumulo’s mission is to help enterprises unleash the power of their file-based data. We believe this data is the engine of today’s innovation. Businesses everywhere are struggling with the immense scale of the data they’re creating, curating, analyzing or simply being asked to manage. From terabytes to petabytes, this challenge grows each day and the burden of not being able to derive Intelligence from that data further compounds the problem.
It’s critical that users be able to effectively store, manage, and understand their data in order to make strategic, actionable decisions from it.
Would you like to learn more about the third age of data, and if your current storage infrastructure can meet the growing data deluge? Contact us for more information here.