When we first started building our file system, we knew we wanted to be able to control and inspect the file system with command-line tools, UI, and automated tests. Using a REST API to expose this functionality was a natural fit, and we realized that if we wanted this functionality inside Qumulo, our customers would likely want it, too. So, we chose to make our REST API public from the beginning.
In this post, we’ll explore the tenets of our API, challenges with REST, and how we continually evolve the API alongside our file system.
Representational State Transfer (REST) is a widely-used architectural style that we assume you’re already familiar with. In defining a new API using REST, there are many choices to make along the way. First, you need to decide on what kind of functionality goes in your API. Control plane (system configuration and statistics)? Data plane (files and metadata stored in the file system)? We chose both, plus internal-only endpoints to assist feature development. Everything that can be done on a Qumulo cluster can be done through the REST API.
Next, we considered response content. When using file system protocols like SMB or NFS to read metadata, you get that protocol’s interpretation of file system state, and it can be limited in what it can express. Our REST API in contrast returns ground truth – clients do not need to interpret the returned data, and we return all information available. As we expand the capabilities of the file system (like storing aggregated metadata), we augment our endpoints to expose these capabilities.
Inspired by the REST API Design Rulebook, we classify each of our endpoints as one of these resource archetypes:
In structuring our URIs, we wanted to make it simple for developers or admins to script against our endpoints or use tools like cURL. We also wanted to make sure that clients don’t accidentally break if an endpoint’s contract changes. This led us to putting more content in the URI, like version number, favoring explicit contracts over implicit ones. For example, here’s how to read a directory:
With that design, the only HTTP header we require is Authorization for OAuth2-style bearer tokens:
curl -k -X GET -H “Authorization: Bearer <token>” https://server:8000/v1/file-system
It’s worth noting that we did not try to mimic existing file system REST APIs. We wanted our API to be specific to our file system’s capabilities and give the user maximum control over the system. If at some point in future we want to support clients that talk S3, WebDAV, or whatever, we’ll add new ports for those protocols, keeping them separate from our core REST API.
Many of our configuration endpoints have straightforward behavior: you use GET to retrieve a document (e.g. GET /v1/users/123), and you use SET or PATCH to update the document. The requests take effect immediately, so that when you receive a 200 OK response, you know the change has been made.
But REST is neither stateful nor transactional, which can impact the user experience if not considered properly. Let’s say an administrator is editing a file share on the cluster using the built-in UI. Between the time the UI retrieves the file share details and when the administrator saves their changes, another user or process could change that file share. By default in our API, the last writer wins, so the administrator would unwittingly clobber these changes. That’s not the user experience we want, so we use ETag and If-Match HTTP headers for all of our documents to prevent accidental overwrites. When the UI retrieves a document, it reads the ETag response header (entity tag, or essentially a hashcode) and stores that. Later, when updating that same document, the UI sends an If-Match request header, which tells the cluster to only perform the action if the document is the same as we expect. If the document changed, we’ll get back a 412 Precondition Failed response, which allows us to build a better experience for the user.
Long-running actions also require special consideration. To keep our REST API response times predictable, we process short-running requests synchronously, and long-running requests asynchronously. We classify every endpoint in our API as short- or long-running, so that clients know what kind of response they need to handle to reduce complexity. All GET, PUT, and PATCH operations on documents and collections are short-running requests, returning 200 OK when successfully processed. In contrast, we always POST to a controller endpoint for long-running requests, which return 202 Accepted with a URI to poll for completion status. For example, when joining a cluster to Active Directory, the client invokes the controller like this:
If the request is valid, the controller responds:
The client can then issue repeated GET /v1/ad/monitor calls while waiting for the join action to succeed or fail.
To ensure our REST API keeps pace with our file system’s capabilities, the endpoints are auto-generated from code. This means that the file system, API, and API documentation are always in sync. Our build system prevents us from accidentally making changes to internal data structures that would result in REST API changes that break API clients. And by putting our API documentation in code, it stays current with the code.
Two years into the development of our REST API, we realized we had a problem: the API had grown organically as different dev teams added functionality, which led to inconsistencies between endpoints and a questionable hierarchy that made it difficult to discover functionality. To address this, we did two things: we migrated to a new API namespace over a series of releases to fix consistency and discoverability issues, and we created an API roadmap for Qumulo engineers to follow that allows the API to evolve and remain consistent. An example of an API namespace improvement was the consolidation of all real-time analytics-related functionality under /v1/analytics. Previously, this functionality was scattered across the entire namespace, and when we heard from customers that they couldn’t find these features, we knew this was an area to improve.
Now that we’ve solidified our /v1 API, individual endpoints can change version if a breaking change is needed. (Breaking changes include things like adding new required fields to requests, or changing the semantics of data we return.) Even with this provision, breaking changes are a last resort. We strive to find ways to augment response data or introduce optional fields without impacting existing API clients.
In this post, we explored the tenets of Qumulo’s REST API, how we tackled some challenges with REST, and our approach for evolving the API in conjunction with the product.
We are always looking for new challenges in enterprise storage. Drop us a line and we will be in touch.
Enter a search term below