Close this search box.

Attributes of a Modern File Storage System

Authored by:

About two decades ago, a number of companies developed a parallel and distributed file storage system. The impetus was that, when data began growing exponentially, it became clear that scale-out storage was the paradigm to follow for large data sets. WAFL, IBM Spectrum Scale (aka GPFS), Lustre, ZFS and OneFS are all examples of scale-out file storage systems. All these systems have something in common: they had their “first boot” sometime around the year 2000. They also all have their strengths and weaknesses. Some of these systems are not really scale-out; others are difficult to install and operate; some require special hardware or don’t support common NAS protocols; they may have scalability limits, or lack speed of innovation.

The fact that these storage systems were designed 20 years ago is a problem. Many important Internet technology trends such as DevOps, big data, converged infrastructure, containers, IoT or virtual everything were invented much later than 2000, so these file storage systems are now used in situations they were never designed to handle. It is clearly time for a new approach to file storage.

RELATED: Block Storage vs Object Storage vs File Storage: What’s the Difference?

Qumulo was designed by several of the same engineers who built and created scale-out storage roughly 15 years ago (Isilon), and obviously, their experiences led them to a very modern and flexible solution.

The modern file storage system is hardware-independent

Several data storage vendors say their product is independent of hardware-specific requirements. They may have used the term “software defined.” Two qualities of a software-defined product are:

  1. Independent of any hardware-specific dependencies
  2. Programmatically extensible

Qumulo fulfills both requirements admirably. You can run Qumulo on standard hardware provided by Qumulo, on HPE Apollo 4200 servers, and in AWS. For development and testing purposes, Qumulo offers a free OVA package so that you to run a fully functional cluster on VMware Workstation or Fusion. You can also run a standalone instance of Qumulo, with 5TB of storage, on AWS for free. You only pay for the AWS infrastructure.

Because Qumulo is fully manageable via an API, it’s fully extensible and it can be integrated into any operational environment.

The modern file storage system runs in user space

The Qumulo OS is built on Ubuntu. Qumulo developers can leverage all the capabilities of the Linux ecosystem.

The Qumulo file storage system processes run in Linux user space rather than in kernel space, which has a number of advantages:

  • Qumulo has its own implementations of protocols such as SMB, NFS and LDAP, which are independent of the underlying OS. For example, NFS runs as a service with its own notations of users/groups. This makes Qumulo more portable.
  • Kernel mode is primarily for device drivers that work with specific hardware. By operating in user space, Qumulo reinforces its hardware independence. It can run in a wide variety of configurations and environments
  • Running in user space means that Qumulo can develop and deliver features at a much faster pace.
  • Running in user space improves Qumulo reliability. As an independent user-space process, Qumulo is isolated from other system components that could introduce memory corruption, and the Qumulo development processes can make use of advanced memory verification tools that allow memory-related coding errors to be detected prior to software release. By using a dual partition strategy for software upgrades, Qumulo can automatically update both the operating system and Qumulo software for fast and reliable upgrades. You can easily restart Qumulo without having to reboot the OS, node or cluster.

Interactive API, real-time analytics and quotas

Qumulo is programmatically extensible. It has a complete API, which can be extended and integrated into any datacenter environment.

If you like, you can use the API as the primary interface for all your management and operation tasks. However, for convenience, there is also a web UI and a CLI available. Both the UI and CLI use the same API that anyone can use to interact with Qumulo. The API and Python bindings are documented and available on GitHub. The same is true for the command line wrapper, qq.

One of the smartest things in Qumulo is its real-time analytics capabilities. Metadata, such as bytes used and file counts, is aggregated when files and directories are created or modified, which means the information is available for timely processing without expensive file storage system tree walks.

The web UI includes a large number of real-time dash boards and graphs such as IO hotspots and throughput hotspots, and all the data can be retrieved via the well-documented API if you would also like to process it with other tools.

Related Posts

Scroll to Top