By Henry Precheur and Kevin Jamieson

Qumulo’s continuous replication ensures that an up-to-date copy of a cluster’s data is mirrored to an off-site Qumulo cluster or to the cloud. This allows our customers to sleep soundly at night knowing that they can quickly recover in the event of a natural disaster or catastrophic failure.

How does a file system that scales to billions of files and petabytes of data manage to keep a copy of data that is typically up-to-date within minutes?

Traditional tools for incremental backup like rsync and robocopy just don’t scale. These tools identify changes between a source and destination directory by walking the entire directory tree to compare the attributes —typically size and modification time— or the data of every file.

When only a handful of files may have changed between incrementals, even with parallelization traversing billions of files to identify those changes may take hours or days.

At Qumulo, one of our mantras is “no tree walks.” Qumulo’s continuous replication is powered by the same real-time aggregation that enables our efficient analytics and quota features to avoid tree walks. Together with snapshots, this allows any changes in the file system to be efficiently identified and replicated with the minimum amount of time and I/O.

Continuous replication works by continually taking a new snapshot, comparing that snapshot against the last replicated snapshot, and replicating the differences. Each snapshot has a unique, auto-incrementing identifier or “epoch,” and associated with every file and directory is something we call a “last modified epoch.” This acts like a timestamp describing the last snapshot in which that file or directory changed.

Every time a file gets modified its last modified epoch is updated. This way we know if a file has changed without looking at its contents, but that alone is not enough to enough to locate a changed file without a tree walk. We also update the last modified epoch of the file’s parent directory, and its parent’s parent directory, and so on all the way up to the root directory of the file system. This leaves a trail to quickly find all the changes in the file system between snapshots.

When continuous replication runs, we compare these last modified epochs against the latest snapshot, starting with the root directory. If this comparison shows the directory has not changed since the last time replication ran, we can skip it and all of its contents. If the directory has changed, we recursively compare the last modified epoch of every file and sub-directory under that directory to find out which files have changed and must be copied to the remote cluster.

Consider this simple example:
/home (epoch=1)
|– alice/ (epoch=1)
| |– diagram.svg (epoch=1)
| `– report.doc (epoch=1)
`– bob/ (epoch=1)
`– project.psd (epoch=1)

Initially, every file has its last modified epoch set to 1. When Alice updates report.doc in /home/alice, both report.doc’s and all its ancestor directories’ last modified epoch is set to 2. After which the file system now looks like this:
/home (epoch=2)
|– alice/ (epoch=2)
| |– diagram.svg (epoch=1)
| `– report.doc (epoch=2)
`– bob/ (epoch=1)
`– project.psd (epoch=1)

The unchanged epoch of /home/alice/diagram.svg tells continuous replication to skip that file and only copy the updated report.doc, while the epoch of /home/bob informs replication to ignore that directory entirely and not examine any of its contents.

How about partially-modified files? If only a few bytes of a large file have changed it would be a waste of time and bandwidth to re-replicate that file entirely. It is far more efficient to replicate only the subset of blocks of the file that have actually changed. Because the Qumulo file system stores a file’s data in a tree-like structure of chunks of data, the same last modified epoch concept that allows efficient identification of changed files within a directory tree also extends to the efficient identification of changed data blocks within each individual file.

As an example, suppose a 3MB file is created and replicated in epoch 1, after which a user overwrites only the middle of the file with new data. The file’s data tree would then look something like this:
file
|– 0 – 1MB (epoch=1)
|– 1 – 2MB (epoch=2)
`– 2 – 3MB (epoch=1)

We find the updated chunks of data in a file in the same way we find the updated files in a directory — by examining the last modified epochs in its tree. In the example above, we skip the first and last chunk of the file and only replicate the data in the middle with the updated epoch, doing only a third of the work we’d have to do if we copied the entire file.

In this way, epochs act as a trail to the files and data within the files that were modified between replicated snapshots, enabling Qumulo’s continuous replication system to identify all of the differences in the file system in a time that is proportional to the size of those differences, not the total size of the file system!

Share with your network