To provide more visibility into the way that Qumulo clusters are used and managed, we recently introduced an audit log. All you need to use it is a server running a remote syslog, which many monitoring and alerting platforms have out of the box. Once pointed at a log server, each node sends records of everything that happens, such as who did what, and whether it worked or not. When users interact with the cluster directly, like through the API, the log simply reports each user action.
Queries to most API endpoints will write simple, single entries to the audit log. For instance, if I used our CLI to run qq restart_cluster, the audit log would show this:
Let’s break that down.
The first three fields are the standard syslog header. They include the time the operation was completed, the name of the node that did the operation (Blog-1), and “qumulo”. The formatting of these can be easily changed by the program receiving your logs, to make the whole audit log CSV for instance. They’ll be left out from here on.
Next comes the user that performed the operation. “Colin” is my local username, and “::1” is my IP. It’s the IPv6 version of localhost, because I’m using a “simcluster” for testing. More on that here:
Last is the operation that was executed. The Qumulo CLI tools are built on the API, so this operation is labeled with “api”. Other operations could have “smb2” or “nfs3” in that place. “cluster_restart” is the command that I ran. If the operation had failed, “ok” would instead show the error like “api_permission_denied_error”. The rest are blank fields used by operations that need more detail.
The same filesystem action over any protocol should look nearly identical in the audit log. For instance, this is the log of me creating one file over NFS and one over SMB:
… "1001",nfs3,fs_create_file,ok,3,"/nfs_file","" … "1001",nfs3,fs_write_metadata,ok,3,"/nfs_file","" … "Colin",smb2,fs_create_file,ok,4,"/smb_file","" … "Colin",smb2,fs_write_metadata,ok,4,"/smb_file",""
The only differences are the file, the protocol, and the user, which shows 1001 because over NFS the only user identities are UIDs. Both times it writes metadata separately. The advantage of this approach is that searching through the log doesn’t require knowing the protocol of the operation you’re looking for. To find the deletes, just search “delete”, no need to know that SMB calls it “unlink”.
So far so good. If I just mount the NFS export “/” again, what operations will be logged?
… "0",nfs3,nfs_mount,ok,,"","" … "0",nfs3,fs_read_metadata,ok,2,"/","" … "1001",nfs3,fs_read_metadata,ok,2,"/","" … "1001",nfs3,fs_read_metadata,fs_no_such_entry_error,,"/.Trash","" … "1001",nfs3,fs_read_metadata,fs_no_such_entry_error,,"/.Trash-1001",""
That’s surprising! I only mounted the export, I didn’t touch any of those other files. It does make some sense that the operating system would want to check for permissions on the directory we just mounted, and on the second and third lines we do that first as root (UID 0), then with my UID, 1001. The 2,”/” at the end of the line means we looked at the root directory, which has file ID 2. The last two lines are more surprising, but it turns out they are a check for a Unix-y type of recoverable trash can, one for everyone and one for my UID specifically. https://specifications.freedesktop.org/trash-spec/trashspec-1.0.html
There are all kinds of quirks like this that the audit log reveals. Here’s an SMB mount of the default share “Files” from macOS 10.
… "Colin",smb2,smb_login,ok,,"","" … "Colin",smb2,share_connect,ok,2,"\\Blog-1\FILES","" … "Colin",smb2,fs_fsstat,ok,2,"/","" … "Colin",smb2,fs_read_metadata,ok,2,"/","" … "Colin",smb2,fs_list_directory,ok,2,"/","" … "Colin",smb2,share_connect,ok,0,"\\Blog-1\IPC$","" … "Colin",smb2,fs_open,fs_no_such_entry_error,,"/.hidden",""
Okay, we login, connect to the share, look at its space usage (fsstat), then check permissions and contents of the directory, fine. But then it tries to mount a share I don’t even have, “IPC$”.
And it succeeds! I found some documentation, but I had to actually go through old release notes to find out that this is a special hidden share for programs to communicate over named pipes. Last, it fails to open the “/.hidden” file, apparently “one of three ways a file can be made invisible in OS X. This file is semi-obsolete.” (Source: http://www.westwind.com/reference/OS-X/invisibles.html)
There are actually even more of these strange operations than the audit log shows. For instance, if I create two directories using mkdir in the SMB share…
… "Colin",smb2,fs_read_metadata,ok,2,"/","" … "Colin",smb2,fs_list_directory,ok,2,"/","" … "Colin",smb2,fs_create_directory,ok,5,"/dir1/","" … "Colin",smb2,fs_open,fs_no_such_entry_error,,"/","" … "Colin",smb2,fs_create_directory,ok,6,"/dir2/",""
When I make the first directory, it looks like the OS proactively checks that we aren’t overwriting anything, but it doesn’t seem to do that for the second directory. In fact, the audit log de-duplicates identical filesystem operations within one minute, so those are still happening, we just aren’t seeing them. This has the most benefit when reading and writing large files, which are typically read and written in chunks of about a megabyte. Not only does this make the log significantly smaller for storage, it frees up network bandwidth for more important things.
These are all simple examples; I’ve tried auditing the process of opening a project in my IDE and it results in pages of failed read and open actions on config files I’ve never heard of. To the end-user this is fast and just works. But the Qumulo cluster can’t tell what the underlying action the user is trying to take is, so it all ends up in the audit log. And that’s the reason that the audit log is so noisy: it records what the computer does, not the user.
I’ll leave you with one final mystery. The second to last line in the last example:
It fails to open the directory we just added to, so I suppose it must be opening it as a file or symlink or something. But why? Isn’t it obviously still a directory?
Unless someone wants to spelunk through layers of OS X code, the world may never know.
Colin is a member of Qumulo’s technical staff in engineering.