ADAS data challenges go far beyond capacity. Qumulo’s modern file system addresses these challenges with scale-out NAS and a hybrid cloud approach.
The development of technologies for autonomous driving is currently the hottest topic in the automotive industry. Billions of dollars and Euros are being spent to develop a new generation of sensors and technologies that collect, process, correlate and analyze sensor data of different types.
Recent autonomous driving systems shown at CES 2019 are equipped with four short-wave LIDARs, five long-wave LIDARs, six electronically scanning radars (ESR), four short-range radars, a trifocal camera, one traffic light camera and other sensors. Even without LIDAR sensors, a typical test vehicle with current video and RADAR sensors produces up to 30 TB of raw data per day. This causes challenges for typical data center infrastructure due to the incredible amount of data. Considering a fleet of 10 test cars, that operate about 200 days a year, the amount of 60 PB of data being generated needs to be stored, processed and managed.
Scale-out file systems like Qumulo’s ADAS storage solution can help to address these challenges with a very modern file system that scales to many petabytes. It runs on standard x64 hardware platforms from different vendors and in the cloud, it is very easy to manage, and can be integrated into any automation environment due to its complete API functionality.
Storage capacity challenges require scale-out NAS
Besides the enormous capacity volumes requiring modern scale-out NAS solutions, there are additional challenges to consider. For example:
- Data ingest is global. Due to the amount of data being collected in test vehicles, it cannot be uploaded online to a company’s data center. Instead, data needs to be uploaded to the storage system locally when the car is “at home.” Typically test cars drive in all regions of the world, which means data will be ingested to a central system from all parts of the world.
- Data access is global. There are several steps in data processing that also happen from multiple regions. For example, data annotation is, to a large extent, a manual process. People from all over the world are performing this; at least the portion of data being used as training data for ML algorithms must be annotated or validated by humans.
- Data access must be fast. Several process steps require low-latency access to data such as video processing, object recognition, and machine learning.
- High bandwidth access. HIL (hardware in the loop) and SIL (software in the loop) testing require very high throughput. Typically many streams are read in parallel to reduce simulation times. Consider, for example, that 80 parallel streams of video data with 400 MB/s per stream requires 32 GB/s throughput from the storage system.
These requirements are contradictory to some degree. While for global ingest and access, object storage in the cloud seems to be the ideal solution, fast and high bandwidth access are attributes provided by local file system storage. And while SIL simulations can run completely in the cloud, HIL requires that physical devices und test (so the video cameras, LIDAR and RADAR sensors for example) be present in the validation. These devices cannot be “placed” in the cloud, obviously, and need to access local data with high throughput. At the same time, HIL and SIL applications typically require a file system for data access.
For that reason, I have seen companies running a hybrid approach where data is ingested and pre-processed in local data centers and then uploaded to the cloud where it’s centrally stored and indexed (see Figure 1).
Figure 1: Mixed environment with legacy local file storage and cloud storage
There are some disadvantages with this approach. The local file storage is typically historically grown and not made to handle the volume of data, nor is it well-suited to exchange data with the cloud. APIs for management don’t exist, or are completely different to the cloud instances.
How Qumulo addresses the ADAS data challenges
Qumulo is the most modern file system on the market. It can spawn many nodes, it can be deployed on various hardware platforms, and it can also run in the cloud. System management, APIs and access protocols are 100 percent similar on-premise and in the cloud. Qumulo provides Terraform and CloudFormation scripts (AWS) so that a scale-out file system cluster in the cloud can be deployed in minutes. See in our video below how Qumulo provides choice and deployment flexibility:
Qumulo also provides all the enterprise features in the cloud that users expect from modern systems: multi-protocol access (NFS, SMB, FTP, HTTP), multi-site replication, snapshots, quotas, analytic capabilities for monitoring and planning, and others. Figure 2 illustrates such a hybrid implementation for ADAS development and HIL/SIL simulation in an AWS environment.
Figure 2: Scale-out file on-premise and in the cloud
By deploying such a hybrid solution with Qumulo, the borders between local file systems and the cloud are diminishing. The above mentioned, and contradicting requirements can be satisfied much easier:
- Files can be ingested from anywhere to a local Qumulo file system
- Files can be replicated between data centers and cloud instances
- HIL simulations can run locally with low latency and high bandwidth
- SIL simulations can run on-premise or in the cloud (using cloud compute instances)
- Qumulo storage can be spun up in the cloud in minutes. This makes it perfectly suited to run temporary workloads, like the training for machine learning or SIL simulations
- Data can be tiered from Qumulo clusters (on-premises or in the cloud) to S3 buckets
- The Media Asset Management catalog system can access Qumulo via rich APIs for management, and automation of even data access.
- The management of a Qumulo instance is similar on-premises and in the cloud. This lowers administration costs compared to scenarios where different enterprise data storage solutions are deployed on-premises or in the cloud.
- Qumulo file systems grow on demand, whether you add a physical node on-premises or another compute instance in the cloud. In both cases, capacity and performance scale up linearly.
- Enterprise data storage features are similar in the cloud and on-premises
- Applications don’t need to be rewritten when they run in the cloud because Qumulo provides the same multi-protocol access in both environments.
Qumulo customers like Hyundai MOBIS leverage Qumulo for analysis of hundreds of terabytes of video data from vehicle sensors used for designing and building assisted and autonomous cars. Qumulo’s cluster can ingest the steady stream of machine-generated data without constant management—a huge productivity benefit. To see how automotive companies like Hyundai use Qumulo’s hybrid cloud file storage for ADAS development, read our case study from Hyundai Mobis.