Scripting Qumulo with S3 via Minio

At Qumulo, making sure customers can easily access and manage their data is hugely important as we work to fulfill our mission of becoming the company the world trusts to store its data forever. For a long time now, users have been able to interact with their data via SMB, NFS, and RESTful APIs. For most customers, these protocols meet their needs. However, a growing subset of our customers are looking to talk to their Qumulo through an S3 compatible API in order to leverage the economics and performance of file storage with modern tools written for object.

Object storage is an increasingly popular option for customers looking to store their data in the cloud. Even for customers who aren’t looking to leverage object storage, many tools they’re starting to use assume an object backend, and communicate via Amazon’s S3 API (which has become the de facto standard in object storage APIs).

For customers who want to interact with Qumulo via the S3 SDK or API, we recommend using Minio. Minio is a high-performance object storage server which acts as a S3 compatible frontend to different cloud and local storage. This means you can have a Minio server sit in front of your Qumulo storage and handle S3 requests.

In this tutorial, I assume you already have a Qumulo cluster setup. If that’s not the case, please follow this tutorial first.

Deployment Model

For optimal performance, Minio’s distributed gateway model is recommended. Using a load balancer or round-robin DNS, multiple Minio instances can be spun up and connected to the same NAS. The load balancer can distribute application requests across a pool of Minio servers which talk via NFS to Qumulo. From your applications’ perspective, they’re talking to S3 while Qumulo just sees several NFS clients attached to it, so no need to worry about locking.

Recommended Environment

Qumulo Nodes: All Qumulo products are compatible with Minio

Qumulo Client/Minio Server: 4 x uC-small (mc-14,15,18,19)

Mounts: Each client has each Qumulo node mounted with default NFS arguments

Minio server: 4 instances running via Docker on each client machine

Minio client (mc): Running x86 native on each Minio server machine

Tutorial

Download Minio

Let’s get started by downloading Minio. Minio is available for all major operating systems, and can even be run as a Docker or Kubernetes container.

Docker

$> docker pull minio/minio


Linux

$> wget https://dl.minio.io/server/minio/release/linux-amd64/minio
$> chmod +x minio


MacOS

$> brew install minio/stable/minio


Windows

Download and install via https://dl.minio.io/server/minio/release/windows-amd64/minio.exe

Running Minio in Gateway Mode

Inside each Docker container on your clients, spin up a Minio instance with the following command:

Docker

$> docker run -d -p 9000:9000 -e "MINIO_ACCESS_KEY=minio" -e "MINIO_SECRET_KEY=minio123" --name minio -v /mnt/minio-test:/nas minio/minio gateway nas /nas

Linux

./minio gateway nas ./Path-To-Mounted-Qumulo


MacOS

minio gateway nas ./Path-To-Mounted-Qumulo

Windows

minio.exe gateway nas X:\Path-To-Mounted-Qumulo

Test that It’s Working

To test that your Minio server is working, we’re going to download Boto, the S3 Python SDK and write a simple script.

$> pip3 install boto3

I’m going to create a test script in Python called, “minio-test.py”. Inside I wrote the code below. It uses Boto3 to read the file ‘minio-read-test.txt’ stored in the ‘minio-demo’ folder and prints the file contents to the console.

import boto3
from botocore.client import Config

# Configure S3 Connection
s3 = boto3.resource('s3',  
  aws_access_key_id = 'YOUR-ACCESS-KEY-HERE',
  aws_secret_access_key = 'YOUR-SECRET-KEY-HERE',                                                                                                         
  endpoint_url = 'YOUR-SERVER-URL-HERE',
  config=Config(signature_version='s3v4'))

# Read File
object = s3.Object('minio-demo', 'minio-read-test.txt')
body = object.get()['Body']
print(body.read())

A full code sample which shows how you can perform additional S3 operations can be found below.

Conclusion

Minio is a stable and hugely popular open source project touting over 105 million downloads. The project is popular with an extremely active community, which makes us excited about customers deploying it into their environments. We’re also excited because we take our customers’ feedback seriously, and deploying Minio as a frontend for Qumulo addresses a top request for an S3 compatibility layer.

Full Code Sample

# Import AWS Python SDK
import boto3
from botocore.client import Config

bucket_name = 'minio-test-bucket' # Name of the mounted Qumulo folder
object_name = 'minio-read-test.txt' # Name of the file you want to read inside your Qumulo folder

# Configure S3 Connection
s3 = boto3.resource('s3',  
  aws_access_key_id = 'YOUR-ACCESS-KEY-HERE',
  aws_secret_access_key = 'YOUR-SECRET-KEY-HERE',                                                                                                         
  endpoint_url = 'YOUR-SERVER-URL-HERE',
  config=Config(signature_version='s3v4'))

# List all buckets
for bucket in s3.buckets.all():
  print(bucket.name)

input('Press Enter to continue...\n')

# Read File
object = s3.Object(bucket_name, object_name)
body = object.get()['Body']
print(body.read())
print('File Read')
input('Press Enter to continue...\n')

# Stream File - Useful for Larger Files
object = s3.Object(bucket_name, object_name)
body = object.get()['Body']
with io.FileIO('/tmp/sample.txt', 'w') as tmp_file:
  while file.write(body.read(amt=512)):
    pass

print('File Streamed @ /tmp/sample.txt')
input('Press Enter to continue...\n')

# Write File
s3.Object(bucket_name, 'aws-write-test.txt').put(Body=open('./aws-write-test.txt', 'rb'))
print('File Written')
input('Press Enter to continue...\n')

# Delete File
s3.Object(bucket_name, 'aws_write_test.txt').delete()
print('File Deleted')
input('Press Enter to continue...\n')

# Stream Write File

# Create Bucket
s3.create_bucket(Bucket='new-bucket')
print('Bucket Created')
input('Press Enter to continue...\n')

# Delete Bucket
bucket_to_delete = s3.Bucket('new-bucket')
for key in bucket_to_delete.objects.all():
  key.delete()

bucket_to_delete.delete()

print('Bucket Deleted')
input('Press Enter to continue...\n')

 

Performance

Depending on how many Minio gateway instances are spun up, performance will vary. Generally speaking, the more parallelizable the workload and the more gateways in front of Qumulo, the better performance will be. To help customers gauge whether or not Minio might help them, we’ve publish both our performance test results and our test methodology.

Test Environment

  • Qumulo Nodes: 4 x Q0626 (du19,21,23,30)
  • Qumulo Client/Minio Server: 4 x uC-small (mc-14,15,18,19)
  • Mounts: Each client has each Qumulo node mounted with default NFS arguments
  • Minio server: 4 instances running via Docker on each client machine with the command below
$> docker run -d -p 9000:9000 -e "MINIO_ACCESS_KEY=minio" -e "MINIO_SECRET_KEY=minio123" --name minio -v /mnt/minio-test:/nas minio/minio gateway nas /nas

Minio client (mc): Running x86 native on each Minio server machine

Single-Stream Write: 84MB/s

Streamed zeros to Qumulo via Minio client’s mc pipe command:

$> dd if=/dev/zero bs=1M count=10000 | ./mc pipe minio1/test/10Gzeros
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 124.871 s, 84.0 MB/s

Using Qumulo analytics, we see mixture of read and write IOPS during this time, with the reads coming from the .minio.sys/multipart directory:

This is due to the way that the S3 protocol deals with large files wherein the file is uploaded in chunks and then reassembled into the final file from those parts. When in NAS gateway mode, Minio implements this behavior by making each chunk its own temporary file and then reading from them and appending them in order to the final file. Essentially, there is a write amplification factor of 2x and an extra read of all of the data that was written.

Single-Stream Read: 643MBps

Streamed back the file I wrote via Minio’s “mc cat” command, making sure to drop Linux filesystem cache and Qumulo cache first:

$> /opt/qumulo/qq_internal cache_clear
$> echo 1 > /proc/sys/vm/drop_caches

$> ./mc cat minio1/test/10Gzeros | dd of=/dev/null bs=1M
524+274771 records in
524+274771 records out
10485760000 bytes (10 GB) copied, 16.3165 s, 643 MB/s

 


Mutli-Stream Write: ~600MBps-1GBps

This test was run with 32 10GB write streams running in parallel in the manner described above (2 per Minio instance):

Multi-Stream Read: 1.1-1.7GBps

This test was run with 32 10GB read streams running in parallel in the manner described above (2 per Minio instance):

S3 Benchmarks

Using Minio’s modified version of Wasabi Tech’s S3 benchmark, we were able to produce the following results from our test environment. The benchmark needed to be modified because the original assumes support for object versioning, which Minio in gateway mode does not support.

Single Client

$> ./s3-benchmark -a minio -s minio123 -u http://localhost:9001 -t 100
Wasabi benchmark program v2.0
Parameters: url=http://localhost:9001, bucket=wasabi-benchmark-bucket, duration=60, threads=100, loops=1, size=1M
Loop 1: PUT time 60.2 secs, objects = 7562, speed = 125.5MB/sec, 125.5 operations/sec.
Loop 1: GET time 60.2 secs, objects = 23535, speed = 390.8MB/sec, 390.8 operations/sec.
Loop 1: DELETE time 17.7 secs, 427.9 deletes/sec.
Benchmark completed.


Multi-Client

In this variant of the test, we ran one instance of s3-benchmark per Minio instance for a grand total of 16 concurrent instances. Each s3-benchmark run was assigned its own bucket. In aggregate, write speeds seemed to reach about ~700MBps while read speeds peaked at 1.5GBps and then tailed off:

By upping the file size to 16MiB, I was able to achieve about 1.5-1.8 GBps aggregate write throughput and 2.5 GBps aggregate read throughput at peak. Higher write throughput is possible by specifying more threads, but Minio started return 503 errors, which is probably a result of running four Minio containers per client machine.

The following Bash script was run on each of the client machines:

for i in $(seq 1 4); do 
   s3-benchmark/s3-benchmark -a minio -s minio123 -u http://localhost:900$i -b $HOSTNAME-$i -t 20 -z 16M & 
done;

S3 Compatibility

The following S3 APIs are not supported by Minio.

Bucket APIs

  • BucketACL (Use bucket policies instead)
  • BucketCORS (CORS enabled by default on all buckets for all HTTP verbs)
  • BucketLifecycle (Not required for Minio erasure coded backend)
  • BucketReplication (Use mc mirror instead)
  • BucketVersions, BucketVersioning (Use s3git)
  • BucketWebsite (Use caddy or nginx)
  • BucketAnalytics, BucketMetrics, BucketLogging (Use bucket notification APIs)
  • BucketRequestPayment
  • BucketTagging

Object APIs

Object name restrictions on Minio

Object names that contain characters `^*|" are unsupported on Windows and other file systems which do not support filenames with these characters.

GET A DEMO