How to Copy Objects Between S3 Buckets and Clusters (Part 2)

How to Copy Objects Between S3 Buckets and Clusters

This 2-part series explores what makes Qumulo’s scale-out file system unique on AWS. In part 1, we described how the Qumulo Cloud Q software architecture is built for performance and dynamic scalability, and multi-protocol file access. These are all attributes that Qumulo brings to high-performance workloads running in the cloud. Next, in part 2 below, we’ll continue with what makes Qumulo unique on AWS — focusing on how to copy objects to (and from) Amazon S3 using Qumulo Shift; and, how to automate the deployment of Qumulo Cloud Q on AWS to simplify cloud migration. 

Understanding Replication and Data Movement Between Data Center Clusters and Amazon S3

There are a growing number of workflows where data needs to be moved between the file system and an S3 bucket. For example, as a media content editor or artist, you typically use a shared file system to merge special effects or collaborate with other artists. Then, you might use other AWS services for transcoding files that sit in an S3 bucket. Another example is genome sequencing, where sequencers write to SMB, analytic processes read the data through NFS, and archiving is being done on S3.

Given the above, data mobility between clusters and Amazon S3 becomes all the more important.

Object Storage vs. File Storage

Historically, object storage solutions were not designed to enable the easy movement of file-based data into a cloud object store (Amazon S3 bucket) so that it can be used with cloud services. (Related: Block Storage vs Object Storage vs File Storage: What’s the Difference?) Take high-performance active workloads, for instance: video editing, special effects, genomic sequencing — these workflows need specialized services to be fully realized, such as transcoding or media processing, machine learning, and data analytics, all of which are available as AWS services.

Qumulo’s file-based storage technology has a built-in feature, called Qumulo Shift, which allows data administrators to create a relationship between a directory and an S3 bucket. Where and when needed, data can be copied from the directory to the S3 bucket.

Copy relationships between data in directories and data in Amazon S3 buckets

Figure 1: Copy relationships between directories and Amazon S3 buckets

As expected from a modern enterprise data storage solution, Qumulo can replicate data between different clusters — and Qumulo Shift makes that possible. The location of the Qumulo cluster is irrelevant in this case. Source and target clusters can reside on-premises, in different Availability Zones (e.g. one for the Qumulo cluster and another that you could use for a disaster recovery Qumulo cluster), different virtual private clouds (VPCs), and even different clouds.

What Is Qumulo Shift for Amazon S3?

Qumulo Shift for Amazon S3 is a free cloud service offered as part of Qumulo Cloud Q for AWS and allows you to copy native files from a directory in a cluster to a folder in an Amazon S3 bucket in its native object format. Qumulo Shift enables data-driven businesses to control costs for a high-performing data strategy, thus improving your ROI.

Qumulo Shift is an integral component of any Qumulo deployment and gives you a seamless data pipeline to and from S3 storage. Using Qumulo Shift for Amazon S3, enterprises can copy objects from any Qumulo cluster — whether on-premises or already running in a choice of clouds — to Amazon’s Simple Storage Service cloud object store (Amazon S3).

Whether you are creating data with file-based applications or you need a backup/archive repository or a staging point for any of the hundreds of cloud-native data analytics and transformative tools that AWS offers, Qumulo Shift enables you to easily move files between your Qumulo storage and Amazon S3.

Copy native files from a directory in a cluster to a folder in an Amazon S3 bucket in its native object format

This feature allows you to put your native file data from your Qumulo cluster whether it’s on-premises or in the cloud into an S3 bucket in its AWS S3 native object format. And that native part is important because that means no proprietary formatting is applied, so you’re free to take advantage and innovate with powerful AWS services and marketplace apps against your S3 dataset.

How Qumulo Shift Works

Creating a replication relationship between an on-premises Qumulo cluster and another in Amazon S3 is made possible with Qumulo Shift. To see it in action, watch a short demo video below. Qumulo Product Manager, Scott Gentry, shows how to make data created in a data center cluster available to AWS services using S3 storage.

How to Copy Objects Between S3 Buckets and Clusters Using Qumulo Shift

Qumulo Shift replication allows you to copy objects from a directory in a cluster to a folder in an Amazon S3 bucket (cloud object store). When creating a replication relationship between a cluster and an S3 bucket, Qumulo Core performs the following steps.

  1. Qumulo verifies that the specified source directory exists on the file system and that the S3 bucket exists, is accessible using the specified credentials, and contains downloadable objects.
  2. Once the relationship is created successfully, a job is started using one of the nodes in the cluster.
    Note: When performing multiple Shift operations, multiple nodes will be used.
  3. This job takes a temporary snapshot of the source directory to ensure that the copy is point-in-time consistent. For example, named replication_to_bucket_my_bucket.
  4. Qumulo Shift then recursively traverses the directories and files in that snapshot, copying each file to a corresponding object in S3.
  5. File paths in the source directory are preserved in the keys of replicated objects. For example, the native file /my-dir/my-project/file.txt will be uploaded as the native object https://my-bucket.s3.us-west-2.amazonaws.com/my-folder/my-project/file.txt.

The data is not encoded or transformed in any way, but only data in a regular file's primary stream is replicated (alternate data streams and file system metadata such as ACLs are not included). Any hard links to a file within the replication source directory are also replicated to Amazon S3 as a full copy of the object, with identical contents and metadata—however; this copy is performed using a server-side S3 copy operation to avoid transferring the data across the internet.

When copying objects between S3 buckets and clusters, Qumulo Shift will check to see if a file was previously replicated to S3 using Shift. If the resulting object still exists in the target S3 bucket (and neither the file nor object have been modified since the last successful replication) its data will not be re-transferred to S3. Qumulo Shift will never delete files in the target folder on S3, even if they have been removed from the source directory since the last replication.

How to Copy Objects from a Cluster to an Amazon S3 Bucket

To copy objects from a directory in a cluster to a folder in an Amazon S3 bucket using the Qumulo Shift Web UI. 3.2.5 (and higher), follow these steps:

  1. Log in to Qumulo Core.
  2. Click Cluster > Copy to/from S3.
  3. On the Copy to/from S3 page, click Create Copy.
  4. On the Create Copy to/from S3 page, click Local ⇨ Remote and then enter the following:
    a. The Directory Path on your cluster (/ by default)
    b. The S3 Bucket Name
    c. The Folder in your S3 bucket
    d. The Region for your S3 bucket
    e. Your AWS Region (/ by default)
    f. Your AWS Access Key ID and Secret Access Key.
  5. (Optional) For additional configuration, click Advanced S3 Server Settings.
  6. Click Create Copy.
  7. In the Create Copy to S3? dialog box, review the Shift relationship and then click Yes, Create.

The copy job begins.

For more information about using Qumulo Shift to copy objects from a cluster to an Amazon S3 bucket, visit our Documentation Portal (docs.qumulo.com) for a step-by-step guide to troubleshooting copy job issues and other best practices.

How to Copy Objects from an S3 Bucket to a Cluster

A new feature of Qumulo Shift, called Qumulo Shift-From, was released with Qumulo Web UI 4.2.3. This feature allows data administrators to create relationships where the S3 bucket is the source and a Qumulo directory is the target, allowing users to shift data from S3 to Qumulo as well as from Qumulo to S3.

To copy objects from a folder in an Amazon S3 bucket to a directory in a Qumulo cluster, follow these steps.

  1. Log in to Qumulo Core.
  2. Click Cluster > Copy to/from S3.
  3. On the Copy to/from S3 page, click Create Copy.
  4. On the Create Copy to/from S3 page, click Local ⇦ Remote and then enter the following:
    a. The Directory Path on your cluster (/ by default)
    b. The S3 Bucket Name
    c. The Folder in your S3 bucket
    d. The Region for your S3 bucket
    e. Your AWS Region (/ by default)
    f. Your AWS Access Key ID and Secret Access Key.
  5. (Optional) For additional configuration, click Advanced S3 Server Settings.
  6. Click Create Copy.
  7. In the Create Copy from S3? dialog box, review the Shift relationship and then click Yes, Create.

The copy job begins and Qumulo Core estimates the work to be performed. When the estimation is complete, the Web UI displays a progress bar with a percentage for a relationship on the Replication Relationships page. The page also displays the estimated total work, the remaining bytes and files, and the estimated time to completion for a running copy job.

Note: For work estimates, Qumulo Shift from S3 jobs calculate the total number of files and bytes in a job's bucket prefix. This requires the job to use the ListObjectV2 S3 action once per 5,000 objects (or 200 times per 1 million objects).

For additional information about copying objects from an Amazon S3 bucket to a directory in a Qumulo cluster, visit the Qumulo Documentation Portal, which steps for troubleshooting copy job issues and other best practices.

Deployment Automation to Simplify Cloud Migration

Deploying infrastructure by using code has many advantages: You always have consistent and repeatable deployments. It’s much faster and you can identify drifts in the configuration. Also, it’s less error-prone and scales for large deployments.

How to Deploy a Qumulo Cluster in AWS

There are three options to deploy a Qumulo cluster in AWS in an automated way. These are:

  1. By using the AWS Quick Start for Qumulo Cloud Q. It is an automated reference deployment built by Amazon Web Services (AWS) and Qumulo. The underlying AWS CloudFormation Templates automate all required steps to build a Qumulo Cluster according to best practices so that you can build and start using your environment within minutes.
  2. The CloudFormation Template that is provided by each Cluster type in the AWS Marketplace.
  3. The AWS Terraform Templates provided by Qumulo on GitHub.
Why Deploy Clusters Using the AWS Quick Start for Qumulo Cloud Q

We recommend you deploy Qumulo clusters using the AWS Quick Start for Qumulo Cloud Q. This is primarily because the Quick Start is backed by a couple of AWS CloudFormation Templates, which simplify and speed up the deployment. Using Quick Start to deploy the full capabilities of Qumulo Cloud Q on AWS, the automated deployment process takes about 15 minutes.

However, you can also use the CloudFormation Templates provided in the AWS Marketplace; they deploy just the basic cluster and two Lambda functions. These serverless functions collect telemetry data from the cluster and send them to AWS CloudWatch; and, they monitor the health of all EBS volumes and replace them automatically in case of EBS volume failures.

Automated Deployment Options to Deploy Qumulo Clusters

The following table lists the different automated deployment options currently available to deploy Qumulo clusters.

Automated deployment options currently available to deploy Qumulo clusters

Table 1: Automated Deployment Options

Unique Features Come Standard with Qumulo's File Storage on AWS

Qumulo’s Hybrid Cloud File Storage on AWS simplifies migrations to the cloud where unstructured data is being stored in file systems, regardless whether data access is through SMB, NFS, FTP or HTTP. File locking and access control works across all protocols; thus, redundant data placement for each protocol can be avoided.

Qumulo’s cloud-native software, Qumulo Cloud Q for AWS, can deliver tends of GB/s on throughput with latencies between 0.5-5 ms. It allows easy data movement between the file system and Amazon S3 buckets. It integrates through deployment templates, and users can subscribe through the AWS marketplace.

As shown below, and described in part 1 of this series, a number of unique features that come standard with a Qumulo Cloud Q software subscription make it an attractive choice on AWS for a variety of high-performance use cases and workflows.

Scalable file counts and high performance file operations

We urge you to be curious with AWS and Qumulo file data services. You can find more information on Qumulo Care, here: Qumulo in AWS: Getting Started.

Another option is to use the Qumulo Studio Q Quick Start, which spins up a complete post-production environment in the cloud for remote video editing and it includes a Qumulo cluster and Adobe Creative Cloud for editing. Lastly, Qumulo can also be deployed as an AWS Nimble Studio option for the file system.

The Definitive Guide to Qumulo on AWS

The Definitive Guide to Qumulo on AWS

Qumulo simplifies migrations to the Cloud where unstructured data is being stored in file systems, making Cloud Q for AWS an attractive choice for many workflows.

Download Now

Share this post