“You guys make it so intuitive and too easy, it gives me confidence. Why can’t everything be this easy?”

June 27, 2023

Authored by:

Qumulo Team

“You guys make it so intuitive and too easy, it gives me confidence. Why can’t everything be this easy?“

— Anonymous Customer

In my first post, our discussion on upgrades ended with some caveats to keep in mind regarding replication. This time we’ll delve into a few things that hit the customer success table and how best to address them.

Let’s start where we left off – versioning. Starting with version 5.0.1, Qumulo enforces a two quarterly rule when it comes to replication (if you remember from last month, quarterly releases are designated by the third digit being a 0). If we look at version 5.3.0, that would speak to two quarterlies prior (5.1.0 and 5.2.0) and two quarterlies after (6.0.0 and 6.1.0). It is important to note that a dot release above the .0 is not within the range. Using our previous example of 5.3.0, 6.1.0 is the hard limit as that’s the quarterly. 6.1.1 is a whole new ballpark as far as replication is concerned and 5.3.0 will flat out refuse to talk to it.

You may ask yourself, what are my options if we inadvertently upgrade outside of the two quarter window? In this case, you would need to upgrade the source side of the relationship. Along these lines, the Qumulo Customer Success team is often asked, “should I pause the replication during the upgrade?” No sir, you should not. The replication engine is one smart cookie and will pick back up when the upgrade is done. When it comes to upgrades and replications, there ain’t nothing to it but to do it.

Another question that hits the halls of Customer Success relates to changing things on the source – such as renaming a source directory. Say, for example, we’re replicating `/financials/current_fiscal’, and we want to rename it to ‘/financials/FY2023’. Will we end up re-replicating all the data? Big negatory there. Since we changed the name, it will trigger a verification, but won’t actually re-replicate anything. Instead, in the replication logs you’d end up seeing the ‘Skipped’ value grow in size as the “little replication engine that could” traverse the data and ensure nothing is waiting to be transferred.

What happens if you have a directory structure of `/zoo/animals/mammals/capybara, but only want to set up a special policy on ‘mammals’? Will it copy the directory structure over for you or doyouI need to create that? Our development team was looking out for you that day – the Qumulo Replication engine will create the preceding directories for you, recreating the tree as ‘/zoo/animals/mammals/‘ on the destination as that policy takes effect.

Let’s stop for a moment and discuss policies. Do I need snapshot policies? Not at all. You can set up a vanilla continuous replication relationship with no policies required. Whether or not it makes sense to do it without a policy is a business decision. However, typically we recommend policies – mostly for extended configuration purposes.

With policies, you can set up replication as ‘policy+continuous’. This will give you the ability to take and transfer snapshots on the source at specific times, as well as setting a snapshot expiration on the target side. Without a policy, you’ll be limited to the single replication snapshot on the destination side (unless you enable snapshotting there). As you grow the number of replications to your disaster recovery site, you’ll likely want to investigate implementing policies if you haven’t already.

Finally, let’s consider local users and replication. If you have local users defined on the source cluster, do you need to define the same local users on the destination? As a best practice, you should. The cluster is going to reference the NFS UID if local permissions were set on files. Replicating those over, if the UID isn’t set on the destination side, the cluster won’t know who should get the permissions, stop replicating, and kick out an error along the lines of

“ Last attempt: /data/ cannot be replicated because it belongs to a local user. Either remove all local users and groups from the file or ensure that all local users and groups have associated NFS IDs and edit the relationship to enable mapping local IDs to NFS IDs.”

You would also need to check the local users on the source (Cluster -> Local Users and Groups) and recreate those on the destination. In the relationship, you’ll need to edit it to checkbox for ‘Map Local User/Group IDs to Associated NFS IDs’. The replication engine will automatically retry after 60 seconds, so as long as you did everything right, replication will pick back up and keep on trucking.

Ideally this has dispelled some confusion about replication and how it relates to your Qumulo cluster. As always, if you have any questions, please don’t hesitate to reach out in your Slack channel, and your friendly Qumulo Customer Success Engineer will be happy to help. Come back soon, and we will dive into the wonderful world of snapshots.

Until next time!