by Ferhat Kaddour, Vice President of Sales and Alliances, Atempo
Surprisingly, when purchasing and implementing new storage technology, the actual data migration phase is not always adequately thought through. This can, and often does, lead to issues and user dissatisfaction.
For years, free tools such as robocopy were enough to manage migrations. However, it’s a far more complex environment with today’s requirements for migrating billions of files of all sizes, millions of directories, petabytes of data and high data change rates. Now, system Integrators and end users have to carefully plan their migrations.
Understanding the source data, creating a plan to minimize the migration time, evaluating account storage performance and network capabilities, using a solution that can optimize the read/write performance, and ensuring that proper cutover is made while minimizing operational downtime are all critical operations.
Qumulo and Atempo have joined forces to offer end-users turn-key solutions for migration, allowing them to mitigate challenges by leveraging new technologies and high-performance data movement.
No two data migration projects are the same, but each one requires some serious groundwork including scope assessment, timeline definitions, software and hardware provisioning, and licensing. When the migration is running, you also need daily status updates and monitoring until the final cutover from legacy to new storage. Atempo partners with Qumulo’s Customer Success team to attain this cutover with guaranteed data integrity and minimal legacy storage downtime.
To do so, we leverage the Qumulo and Atempo skilled professional services, in association with Atempo’s Miria Data Movement technology, that allows us to optimize read operations on the legacy storage while writing at full speed on the Qumulo target.
Scalability and Performance are two key success factors:
Scalability. Just as Qumulo storage clusters can scale to hundreds of nodes, Atempo’s data moving software also scales through the addition of more Data Mover machines to increase the speed of overall data transfer by parallelizing data flows. Each Data Mover can move several GB/s in an highly parallelized and multi-threaded environment. In fact, a pool of Data Movers easily delivers more throughput than most networks can handle.
Performance. Each physical Atempo Data Mover works to ensure there are no bottlenecks on the source cluster nodes, and no slowdown in actual data migration or on the destination nodes. Efficient load balancing orchestrates the entire process from source to target. Any single bottleneck in a petabyte data migration throughput can add weeks to a job. Naturally, legacy storage production downtime is not an option when migrations can run for several weeks. Balancing data flows, maximizing available network bandwidth, and even restricting migration IOs during office hours are part of a typical Atempo migration workflow.
The following diagram illustrates a workflow for migration to Qumulo’s file storage. In this case, the customer had almost 1 billion files to move for 500 TB of data.
Qumulo and Atempo do not use NDMP (Network Data Management Protocol) for data movement. This protocol no longer allows us to address the volumes involved in most migration projects. Above around 100 million files and/or 100 TB of data, NDMP hits its operational ceiling. Today’s significant shift to higher volumes of unstructured file data comprised of many more smaller files is ill-adapted to NDMP technology which was designed principally for tape media.
The initial data migration job needs to be followed by a number of incremental synchronizations before the final cutover. This is where re-scanning the entire file system (or “tree walking” )to check for changes is absolutely not an option. Qumulo and Atempo combine powerful metadata management features which only flag new, changed, modified or deleted objects. These objects are part of each successive incremental synchronization until the storage cutover occurs.
The Miria server controls all IOs and manages tasks, processes and components for the complete Miria infrastructure. Once a scheduled migration task triggered, the following occurs:
- The Miria server maps the request with the Miria Data Mover infrastructure and defines the required workload.
- Each Data Mover creates multiple threads with the storage to migrate. They begin collecting and moving files and associated metadata and ACLs plus file encoding details, user rights, groups, advanced shared file system rights, symlinks. The Data Mover reliably loads data onto the target storage.
- The Miria database stores metadata and details on migration jobs. Miria performs constant checks within the data collection and storage layers to ensure full and reliable data integrity.
Atempo’s FastScan rapidly collects and processes the list of new, changed or deleted files minimizing the load on the source storage and avoiding full scan. This file list check is important in the case of a migration which stops during a synchronization execution after already migrating some files which do not need migrating again. FastScan improves the incremental migration process and ensures any restarts are never from scratch.
Last but not least, once data migration is complete, you can opt to continue using the hardware and software components used for migration. Leave the Atempo kit where it is and use it to backup, archive and synchronize (or “permanently migrate”) your Qumulo storage data to another destination (secondary storage, tape, cloud…).
Leveraging our strong technological partnership, Atempo’s FastScan is also available when Qumulo is the source storage allowing to provide a powerful Data Protection system with incremental forever that outperforms the NDMP based legacy backup solutions in terms of efficiency in handling billions of small files with high change rate and scalability in on-premises or to cloud workflows.
Check out Qumulo’s EMEA Technical Director, Stefan Radtke’s recent blog post.
More on this in another post, stay tuned!
We’re engineers, builders, craftsmen and artists. We build products that are always on, that never lose data, and that store and retrieve data fast. Connect with us on social media!