What Makes Qumulo’s Scale-Out File System Unique on AWS? (Part 1)

File System for AWS Cloud Migration

This 2-part series explores what makes Qumulo’s distributed, scale-out file system unique on AWS. In part 1 below, you’ll learn how the Qumulo Cloud Q software architecture is built for performance and dynamic scalability — supporting multi-protocol file access for high-performance workloads running in the cloud or in hybrid cloud environments. In part 2, we focus on how to copy objects between S3 buckets and clusters.

Why Enterprise File Services are Needed in the Cloud

Between the rapid growth of unstructured data, ever-increasing storage capacity requirements, and stringent budgets, IT departments are running into a data center problem — capital expenditures and lack of scalability are a roadblock to innovation and becoming more difficult to justify. Cloud migration is the obvious solution for unlimited performance and storage scalability, and to control costs for a high-performing data strategy.

Lift and Shift Cloud Migration

Enterprises worldwide are choosing to move their data and applications to the cloud, but for many, the question becomes how do you get there quickly and with minimal risk? One of the fastest cloud migration methods is “lift and shift,” which means moving existing applications without major redesigns to the workloads. And, because the majority of on-premises applications work with file systems for Unix/Linux and/or Windows, enterprise-class file systems are needed in the cloud.

The Challenges of Migrating Enterprise Data to the Cloud

As organizations migrate petabyte-scale, high-compute workloads to the cloud, they are faced with unique challenges including choosing a scalable enterprise data storage solution capable of storing, managing, and building High-Performance Computing (HPC) workflows and applications with data in its native form.

When migrating file system-dependent workloads to the cloud, CIOs and system administrators require a solution that addresses the following migration challenges:

  • Access to the data should be possible from any protocol at the same time
  • Permissions and ACLs should be “translated” transparently between POSIX and Windows and potentially other protocols such as FTP or HTTP
  • The solution should have enterprise features that storage administrators use on on-premises, such as Snapshots, Quotas, Kerberos integration, and UID/SID mapping
  • At the same time, the solution should be software-defined with cloud-native integration; for example, automated implementation through Cloud Formation Templates or Terraform as well as integration with Amazon CloudWatch
  • The solution should be scalable and allow expansion of capacity and performance in real-time without any service interruption
  • The system should be able to deal with billions of files without the requirement of performing tree-walks for certain operations such as backups, analytics, or the creation of usability statistics
  • The solution should support SMB, NFS, and sometimes FTP
  • Companies with a multi-cloud strategy want a similar file solution across clouds with the same APIs, management, cloud integration, performance tiers, backup methods, access protocols, etc.
  • Ideally, the solution allows moving data between the file system and Amazon Simple Storage Service (S3) because in many cases, their central data repository lives in S3
  • Alternatively, they may have data on the file system that they want to process with an Amazon native service that operates on file data in S3
  • The file system should support a hybrid cloud environment to easily move data on-premises to the cloud
    Ideally, the solution includes real-time performance and capacity analytics to gain insight on usage patterns, utilization, and cost optimization

Qumulo recognized that legacy scale-out and scale-up solutions were not designed to handle today’s data volumes, file types, applications, and workloads. Legacy data storage systems simply can’t provide a path to the cloud — so we built a better one.

Below, we’ll outline how the Qumulo Core software addresses these requirements on-premises and in the cloud. We explore in detail how our unique hybrid cloud approach significantly simplifies migrations of unstructured data to AWS and related applications, enabling you to manage data seamlessly between your data center and cloud environments.

A Cloud-Native File Storage Solution Built on EC2, EBS, & S3

Qumulo Cloud Q for AWS is a cloud-native file storage solution that is built on top of Amazon Elastic Compute Cloud (EC2), Amazon Elastic Block Store (EBS) volumes and Amazon Simple Storage Service (S3). It delivers many interesting features which go beyond other enterprise data storage solutions, including:

  • AWS Outpost support
  • Available in AWS GovCloud (US)
  • Scale-out architecture: scales to 100 instances, currently about 30+ PB in a single namespace
  • Ultra-high aggregated throughput with low latencies at around 1ms on average
  • Multi-protocol: files can be accessed through NFS/SMB/FTP/HTTP simultaneously
  • Native and directory-based copying of file data into an S3 bucket and back
  • Fully-programmable API
  • Advanced CFT for automated deployments
  • Kerberos/Active Directory integration
  • Snapshot integration
  • Real-time quotas
  • Multi-cloud replication and on-premises to AWS replication

How Is the Qumulo Core File System Built?

The Qumulo Core hybrid cloud file system is built as a user space application that runs on top of a stripped-down Ubuntu LTS version, which is updated frequently. It is a clustered system starting from 4 nodes and scaling to 100 nodes to date. The smallest cluster can be as small as 1TB while the largest deployment can currently host 30.5PB of data. The deployment is done via provided AWS CloudFormation templates and the AWS Quickstart for Qumulo Cloud Q.

Related story: Qumulo’s Core Architecture is Built with Hardware Flexibility in Mind

The following picture illustrates a minimal stack that is deployed through a CFT that meets the principles of the AWS Well Architected Framework.

Minimal Qumulo cluster deployed in a private subnet

Figure 1: Minimal Qumulo cluster deployed in a private subnet.

Let’s break it down: As a best practice, a Qumulo cluster will be deployed in a private subnet. Supported instance types of m5 and c5n are currently supported and the instance type determines the performance to a large extent (more about performance later). The storage space is made up of EBS volumes. Depending on the node type, volumes are either GP2 volumes (all-flash nodes) or a mix of GP2 and SC1 or ST1 (hybrid nodes). Each node gets a static internal IP address and typically 3 floating IP addresses that failover to the remaining nodes if one node should fail. Optionally, the cluster could also be configured with an Elastic IP per node if public IP addresses are needed.

A Lambda function will be deployed to health check all EBS volumes and automatic replacement if one or more EBS volumes fail. Another Lambda function gathers detailed metadata metrics from the cluster and stores them in Amazon CloudWatch logs.

A File System for AWS Built for Performance and Scalability

Single stream throughput, read or write, is limited to 600MB/s or lower if an instance type and EBS configuration won’t support that upper bound. This number equates to the AWS 5 Gbps single TCP flow rate limit enforced outside of an EC2 placement group. This value could be exceeded only if cluster nodes and compute nodes are deployed in the same placement group (by default, Qumulo deploys into a cluster placement group to minimize latency between the cluster nodes).

Multi-stream performance varies with EBS volume configuration and EC2 instance type. Smaller instance types have less network bandwidth and less EBS bandwidth subjecting them to burst credits. Smaller EBS configurations are also subject to burst credits. For guaranteed performance, respective of baseline IOPS, choose at least a c5n.4xlarge instance type. Then adjust the instance type to increase throughput. All-flash architectures should be chosen for high throughput workloads, especially in smaller usable capacity clusters, or highly random workloads. IOPS is another factor to consider for small file workloads or small usable capacity clusters.

Learn more on GitHub: Qumulo Cloud Q QuickStart–Sizing and Performance on AWS (PDF)

The following graph shows the multi-stream performance for an all-flash configuration where each node hosts 8 TiB of data (please be aware that the y-axis shows the throughput in MB/s on a logarithmic scale):

Qumulo Cloud Q All-Flash max read performance per cluster and node count for different instance types.

Figure 2: Qumulo Cloud Q All-Flash max read performance per cluster and node count for different instance types.

The following statistics show the aggregated read latency across the Qumulo global install base. This global install base contains roughly 70% of hybrid nodes (HDD and SSDs) of clusters in the cloud and on-premises. Even with the majority of nodes hosting data on HDDs, 90% of all read requests are being served with latencies smaller than 1ms. This is a result of Qumulo’s intelligent predictive caching algorithm. It enables fast reads, identifies I/O patterns, and prefetches subsequent related data from disk into SSDs or memory.

Aggregated read latency across Qumulo’s global install base

Figure 3: Aggregated read latency across Qumulo’s global install base.

Multi-Protocol File Access

Qumulo Cross-Protocol Permissions (XPP) automatically manages file access permissions across protocols. XPP enables mixed SMB and NFS protocol workflows by preserving SMB access control lists (ACLs), maintaining permissions inheritance, and reducing application incompatibility related to permissions settings.

XPP is designed to operate as such:

  • Where there is no cross-protocol interaction, Qumulo operates precisely to protocol specifications.
  • When conflicts between protocols arise, XPP works to minimize the likelihood of application incompatibilities.
  • Enabling XPP won’t change rights on existing files on a file system. Changes may only happen if files are modified while the mode is enabled.

Qumulo XPP maintains an internal set of ACLS for every file and directory which can contain many access control entries (ACES) and, thus, builds a complex rights structure, just like Windows or NFSv4.1. (These internal ACLS are called QACLS.) Once a file gets access through SMB or NFS, the permissions are being translated or enforced in real time to the appropriate protocol permissions.

For more information, see our Qumulo Knowledge Base article on how to utilize Cross-Protocol Permissions (XPP) in Qumulo Core.

Translation Enforcement for QACLS to NTFS ACLS or POSIX Permissions

Figure 4: Translation/Enforcement for QACLS to NTFS ACLS or POSIX Permissions.

Qumulo provides a set of tools that work together to query the internal QACL structure. For example, the CLI command qq fs_get_acl will provide a list of actual QACLs of a given file or directory:

# qq fs_get_acl --path /
Control: Present
Posix Special Permissions: None

Permissions:
Position Trustee Type Flags Rights
======== =========== ======= ===== ================================================
1 local:admin Allowed Delete child, Execute/Traverse, Read, Write file
2 local:Users Allowed Delete child, Execute/Traverse, Read, Write file
3 Everyone Allowed Delete child, Execute/Traverse, Read, Write file

Another interesting command is:

#qq fs_acl_explain_posix_mode --path /

The output will explain in detail how Qumulo produced the displayed POSIX mode from a file's ACL. Please refer to Cross-Protocol (XPP) Explain Permissions Tools to study an output example.

Next up: The Importance of Data Mobility Between Clusters and Amazon S3 

Now that we've shown you what makes Qumulo Cloud Q a unique file system on AWS and how it solves some of the most common challenges of migrating enterprise data to the cloud, in part 2 you'll learn how to copy objects between S3 buckets and clusters using Qumulo Shift. We take a high-level look at the importance of replication and data movement between data center clusters and Amazon S3; and, we reveal three deployment automation options you can take to simplify cloud migration.

The Definitive Guide to Qumulo on AWS

The Definitive Guide to Qumulo on AWS

Qumulo simplifies migrations to the Cloud where unstructured data is being stored in file systems, making Cloud Q for AWS an attractive choice for many workflows.

Download Now

Written by Dr. Stefan Radtke, CTO, Qumulo, and Jason Westra, Solution Architect, AWS.

Share this post