The Scientific Computing and Imaging Institute (SCI) at the University of Utah is an internationally recognized leader in visualization, scientific computing, and image analysis. As one of eight permanent research institutes at the university, SCI is home to more than 200 students, staff and faculty. Their work spans many domains, ranging from biomedical computing to geospatial analysis. At SCI researchers not only perform data analysis but they also write the software that enables that analysis.
With all this activity, it’s no surprise that SCI produces large amounts of of unstructured data. The management challenge comes from the variety of the data sets. Some data sets are made up of very small files while others are made up of very large files. For example, a single file can be as large as four terabytes!
Nick Rathke, Assistant Director, Information Technology at SCI spoke at this year’s SuperComputing conference in Denver and described how his team uses Qumulo File Fabric (QF2) to make sure the storage needs of every project are met. You can sign up to watch the full presentation below, but here are a few highlights from his talk.
The institute is a very cross-disciplinary group, doing a little of everything, but their bread and butter is image-based modeling. They take things like electron microscopy images, MRI images or any sort of medical imaging, and use open source software packages for scientific computing like Seg3D, Cleaver and ShapeWorks for data acquisition and image processing. Then rendering applications like ImageVis3D and FluoRender are used to generate simulations and visualizations
Here is the overall workflow:
If there is free storage space at a research university, it's going to get filled. Storage is a limited resource, and knowing where your data is going is important not only today, but for capacity planning in the future. If you're like SCI and are federally funded, you have to work through a grant cycle. It is important to know in advance what you need from your storage so you can fit that into your research budget.
The main goal for Nick and his team at SCI, of course, is to help researchers continue to do research. They never want to have to tell a researcher, "I'm sorry, we don't have space on our system." Capacity planning plays an important part in making sure there is enough space on the storage to continue research. By default, QF2 allows someone to quickly see when a user has written a bunch of data, and which path that user has written to. It's easy to figure out who wrote that and where that data is going through the QF2 UI. They can also dump all this data off to a separate system with the Qumulo API for additional analysis later on.
The analytics built into QF2 also gives a way to analyze client performance. It doesn't matter how much storage is available if the system isn't performing well and nobody can use it. For researchers, tracking performance ultimately means time to insight and how long it takes to get results back. In these settings, it is valuable to show a researcher the graphs available on QF2 as a way to encourage better data governance.
During the presentation, Rathke discussed the value QF2 brings to the data center, and their future plans with Qumulo. "With QF2, we can see all this data around capacity and performance, which helps us know where we have to go and how we have to plan this out. That is why over the next few years we're going to add more Qumulo nodes. Because it's a scale-out solution, we can literally add nodes with zero downtime. We plug them in to the network, give it an IP, and it's up and going.”
Watch Nick Rathke's full Supercomputing presentation (on BrightTALK)
We are always looking for new challenges in enterprise storage. Drop us a line and we will be in touch.
Enter a search term below