Splunk is the leading platform for machine data. Splunk gathers many types of log and machine-generated data and, because it is highly scalable, it can index, analyze, and visualize very large data sets. Splunk provides both historic and real-time data analytics and a large supporting ecosystem has developed around it, such as machine learning libraries and various types of software developer's kits (SDKs).
This diagram shows many of the inputs that Splunk can accept and the ways it can process that information.
The main components of any Splunk implementation are forwarders, indexers and search heads. Forwarders are typically software agents that run on the devices Splunk monitors. They forward streams of logs to the indexers. Indexers are the heart of the Splunk architecture. They parse and index data in real time. Search heads are separate servers to which users connect to query data, build reports and visualize data (in smaller environments indexers and search heads can run on the same servers).
This diagram shows the Splunk architecture. The indexers contain hot (H), warm (W) and cold (C) buckets, which we’ll discuss in the next section.
Data in Splunk is stored in buckets that reflect the age of the data:
Splunk can use direct-attached storage (DAS) for all bucket types. However, this is relatively inefficient. If reliability is required, the Splunk replication factor (RF) and the search factor (SF) need to increase. The RF indicates the number of copies d of the raw data while the SF determines the number of copies for the index data. Both have a default value of two but can be changed at implementation time. A factor of two means that all stored data is doubled, which already implies a lot of storage.
DAS storage is complex to manage and this complexity increases as capacity grows. Whether you are using JBODs or RAID arrays, in both cases there is a significant administration overhead. Also, traditional RAID arrays have extremely long rebuild times, which translates to increased risk of data loss.
A much better solution for the majority of data, which sits in cold buckets, is Qumulo File Fabric (QF2). QF2 is a modern, highly scalable file storage system that can be deployed on standard hardware from Qumulo and 3rd party vendors such as HPE or in the cloud. QF2 is easy to install, manage, and expand. Its visibility into the file system lets administrators identify and solve problems in real time.
A QF2 cluster has a minimum of four nodes and it can scale to many petabytes of capacity by adding more nodes. The QF2 hybrid model (there is also an all-flash version of QF2 available) uses SSDs as a relatively large write and read caching layer and HDDs to store colder data. Thanks to this hybrid architecture, all writes and many reads are directly served from SSDs but the economics of a QF2 cluster are largely dictated by the HDDse.
This diagram shows the QF2 hybrid architecture.
Even though Splunk doesn’t yet support using network-attached storage (NAS) storage for hot and warm buckets, QF2 is an excellent solution for cold buckets. When buckets are moved from the storage defined for warm buckets to the QF2 cluster for cold buckets, all data lands first on SSDs. This makes the transfer very fast. Also, cold buckets are still indexed and searchable. Data that is on the SSDs will be served at much higher speeds compared to data on HDDs.
Stefan Radtke has spent his career working in technology, and comes to Qumulo as a principal evangelist of universal-scale storage for EMEA. Most recently, Stefan has been working to bring the best storage solutions to the automotive industry.
We are always looking for new challenges in enterprise storage. Drop us a line and we will be in touch.
Enter a search term below