Preparing to list a paid version of our Qumulo File Fabric (QF2) software in the AWS Marketplace means our cloud engineering team had to solve some unique challenges.

This blog post is the first in a series where we discuss some of those challenges and show how we have initially solved them. This is not to say this is the final solution or even the best solution, but a way for us here at Qumulo to share what we have learned.

The main point of getting your product in the AWS Marketplace is to allow your customers to directly buy, deploy, and manage their infrastructure. For software vendors, it’s convenient because Amazon handles all of the billing and licensing.

AWS Marketplace: Making it easy for customers, convenient for vendors

For us, the biggest challenge is making it easy for customers to deploy the product in the most friction-free process possible. We all know those first few seconds with any product really matters, so the out-of-box experience is really important.

To deliver software via the Marketplace, AWS encourages vendors to use CloudFormation, Amazon’s Infrastructure as Code mechanism, which is usually delivered in a JSON- or YAML-formatted file.

Simple language constructs are missing in CloudFormation which can complicate things a bit. Take looping, for instance. There is no facility to easily create multiple identical bits of infrastructure without resorting to AutoScaling, which may not be a good fit for your software. In a language like Python, I would simply set my upper bound and pump out an array of Objects and move on. With CloudFormation, you end up repeating code to create infrastructure.

For example, here at Qumulo, we support clusters of nodes 4 and greater. Going with the lowest number means that I would have four copies of the EC2 Instance creation code. This becomes very unwieldy once we hit double-digit node counts.

"Resources": {
       "QF2Node1": {
           "Properties": {
               "ImageId": {
                   "Fn::FindInMap": [
                       "RegionMap",
                       {
                           "Ref": "AWS::Region"
                       },
                       "AMI"
                   ]
               },
               "InstanceType": {
                   "Ref": "InstanceType"
               },
               "KeyName": {
                   "Ref": "KeyName"
               },
               "NetworkInterfaces": [
                   {
                       "AssociatePublicIpAddress": "false",
                       "DeleteOnTermination": "true",
                       "DeviceIndex": 0,
                       "GroupSet": [
                           {
                               "Ref": "QumuloSecurityGroup"
                           }
                       ],
                       "SubnetId": {
                           "Ref": "SubnetId"
                       }
                   }
               ]
           },
           "Type": "AWS::EC2::Instance"
       },
       "QF2Node2": {
           "Properties": {
               "ImageId": {
                   "Fn::FindInMap": [
                       "RegionMap",
                       {
                           "Ref": "AWS::Region"
                       },
                       "AMI"
                   ]
               },
               "InstanceType": {
                   "Ref": "InstanceType"
               },
               "KeyName": {
                   "Ref": "KeyName"
               },
               "NetworkInterfaces": [
                   {
                       "AssociatePublicIpAddress": "false",
                       "DeleteOnTermination": "true",
                       "DeviceIndex": 0,
                       "GroupSet": [
                           {
                               "Ref": "QumuloSecurityGroup"
                           }
                       ],
                       "SubnetId": {
                           "Ref": "SubnetId"
                       }
                   }
               ]
           },
           "Type": "AWS::EC2::Instance"
       },
       "QF2Node3": {
           "Properties": {
               "ImageId": {
                   "Fn::FindInMap": [
                       "RegionMap",
                       {
                           "Ref": "AWS::Region"
                       },
                       "AMI"
                   ]
               },
               "InstanceType": {
                   "Ref": "InstanceType"
               },
               "KeyName": {
                   "Ref": "KeyName"
               },
               "NetworkInterfaces": [
                   {
                       "AssociatePublicIpAddress": "false",
                       "DeleteOnTermination": "true",
                       "DeviceIndex": 0,
                       "GroupSet": [
                           {
                               "Ref": "QumuloSecurityGroup"
                           }
                       ],
                       "SubnetId": {
                           "Ref": "SubnetId"
                       }
                   }
               ]
           },
           "Type": "AWS::EC2::Instance"
       },
       "QF2Node4": {
           "Properties": {
               "ImageId": {
                   "Fn::FindInMap": [
                       "RegionMap",
                       {
                           "Ref": "AWS::Region"
                       },
                       "AMI"
                   ]
               },
               "InstanceType": {
                   "Ref": "InstanceType"
               },
               "KeyName": {
                   "Ref": "KeyName"
               },
               "NetworkInterfaces": [
                   {
                       "AssociatePublicIpAddress": "false",
                       "DeleteOnTermination": "true",
                       "DeviceIndex": 0,
                       "GroupSet": [
                           {
                               "Ref": "QumuloSecurityGroup"
                           }
                       ],
                       "SubnetId": {
                           "Ref": "SubnetId"
                       }
                   }
               ]
           },
           "Type": "AWS::EC2::Instance"
       },

Getting user input on cluster size

Another problem is that without user input, how can I tell what size cluster the customer would want to create. Now, CloudFormation can take in inputs, but using the limited conditional logic available in the language means every possible input permutation would have to be accounted for in the CloudFormation template. Ideally, we could auto-generate a CloudFormation template per user, and then just launch the user into a cluster that is customized to their needs.

To solve this problem, we started creating a simple form that a customer could use to input the size of cluster they’d like, receive immediate feedback about the capacity they will have access to, and then launch directly to CloudFormation. In the back-end, this form would send the cluster node-count, instance type, and AWS region, to a back-end script. This back-end script is currently up on our GitHub page under the Cloud-Deployment-Samples folder. It leverages Troposphere, which allows you to programmatically build out bespoke CloudFormation templates. For example, the same 4-node cluster shown earlier can be created using this function:

# add_nodes() takes a given Template object, an count of nodes to create, and
# a name to prefix all EC2 instances with. EC2 instances will be created with the
# naming structure of Prefix + Node + NodeNumber.
def add_nodes(t, nodes, prefix):
   nodes_list = []

   for x in range(0, nodes):
       node_name = prefix + "Node" + str((x + 1))
       t.add_resource(
           ec2.Instance(
               node_name,
               ImageId = FindInMap("RegionMap", Ref("AWS::Region"), "AMI"),
               InstanceType = Ref("InstanceType"),
               KeyName = Ref("KeyName"),
               NetworkInterfaces = [
                   ec2.NetworkInterfaceProperty(
                       AssociatePublicIpAddress = False,
                       GroupSet = [Ref("QumuloSecurityGroup")],
                       DeviceIndex = 0,
                       DeleteOnTermination = True,
                       SubnetId = Ref("SubnetId"),
                   )
               ]
           )
       )
       nodes_list.append(node_name)
  
   # Create a list containing the Private IPs of all nodes.
   output_ips = []
   for i in nodes_list:
       output_ips.append(GetAtt(i, "PrivateIp"))

   t.add_output(Output(
       "ClusterPrivateIPs",
       Description="Copy and paste this list into the QF2 Cluster Creation Screen",
       Value=Join(", ", output_ips),
   ))
   t.add_output(Output(
       "LinkToManagement",
       Description="Click to launch the QF2 Admin Console",
       Value=Join("", ["https://",GetAtt(nodes_list[0], "PrivateIp")]),
   ))
   t.add_output(Output(
       "InstanceId",
       Description="Copy and paste this instance ID into the QF2 Cluster Creation Screen.",
       Value=Ref(prefix + "Node1"),
   ))

At first glance, this doesn’t seem much shorter, but let’s break down what is actually happening here. This function creates any number of nodes using the configuration that the client selects via other parts of the script. All of the nodes have been created in the 18 lines of code in the for loop. Everything after that actually provides some user-experience niceties.

# Create a list containing the Private IPs of all nodes.
output_ips = []
for i in nodes_list:
    output_ips.append(GetAtt(i, "PrivateIp"))

In this section, we grab every EC2 Instance private IP address and add it to a list for later. The following Output sections actually put text and links into the CloudFormation Output tab for the customer to be able to complete setup, and access their brand-new cluster. Currently, after a cluster is created, the client would navigate to the management page of any node, answer a challenge question about their AWS account (in this case we ask for the instance ID of the node), and give a comma-delimited list of all of the nodes private IPs. The outputs here put a link to the management IP of the first node, it’s instance ID, and a comma-delimited list of all the private IPs. Everything you need to get the cluster up, all in one place.

 

John McGovern has seen over a decade helping companies integrate critical technologies into their infrastructure. At Qumulo, he is responsible for building Qumulo for the cloud, helping customers scale their storage systems beyond the data center.

Share with your network