Table of Contents
Fetching ...

Data-Centric Evolution in Autonomous Driving: A Comprehensive Survey of Big Data System, Data Mining, and Closed-Loop Technologies

Lincan Li, Wei Shao, Wei Dong, Yijun Tian, Qiming Zhang, Kaixiang Yang, Wenjie Zhang

TL;DR

This paper tackles data-centric evolution in autonomous driving by surveying big data systems, data mining, and closed-loop technologies. It builds a taxonomy of autonomous driving datasets across milestone generations and analyzes state-of-the-art closed-loop pipelines, including data generation, labeling, simulation, and OTA updates. It highlights empirical industrial studies of NVIDIA MagLev and Tesla's data platforms and discusses high-fidelity data generation via world models and generative AI. The authors discuss future directions such as 3rd-generation datasets, hardware acceleration, personalized driving recommendations, data security, and trustworthy AI. The work provides a repository for further development.

Abstract

The aspiration of the next generation's autonomous driving (AD) technology relies on the dedicated integration and interaction among intelligent perception, prediction, planning, and low-level control. There has been a huge bottleneck regarding the upper bound of autonomous driving algorithm performance, a consensus from academia and industry believes that the key to surmount the bottleneck lies in data-centric autonomous driving technology. Recent advancement in AD simulation, closed-loop model training, and AD big data engine have gained some valuable experience. However, there is a lack of systematic knowledge and deep understanding regarding how to build efficient data-centric AD technology for AD algorithm self-evolution and better AD big data accumulation. To fill in the identified research gaps, this article will closely focus on reviewing the state-of-the-art data-driven autonomous driving technologies, with an emphasis on the comprehensive taxonomy of autonomous driving datasets characterized by milestone generations, key features, data acquisition settings, etc. Furthermore, we provide a systematic review of the existing benchmark closed-loop AD big data pipelines from the industrial frontier, including the procedure of closed-loop frameworks, key technologies, and empirical studies. Finally, the future directions, potential applications, limitations and concerns are discussed to arouse efforts from both academia and industry for promoting the further development of autonomous driving. The project repository is available at: https://github.com/LincanLi98/Awesome-Data-Centric-Autonomous-Driving.

Data-Centric Evolution in Autonomous Driving: A Comprehensive Survey of Big Data System, Data Mining, and Closed-Loop Technologies

TL;DR

This paper tackles data-centric evolution in autonomous driving by surveying big data systems, data mining, and closed-loop technologies. It builds a taxonomy of autonomous driving datasets across milestone generations and analyzes state-of-the-art closed-loop pipelines, including data generation, labeling, simulation, and OTA updates. It highlights empirical industrial studies of NVIDIA MagLev and Tesla's data platforms and discusses high-fidelity data generation via world models and generative AI. The authors discuss future directions such as 3rd-generation datasets, hardware acceleration, personalized driving recommendations, data security, and trustworthy AI. The work provides a repository for further development.

Abstract

The aspiration of the next generation's autonomous driving (AD) technology relies on the dedicated integration and interaction among intelligent perception, prediction, planning, and low-level control. There has been a huge bottleneck regarding the upper bound of autonomous driving algorithm performance, a consensus from academia and industry believes that the key to surmount the bottleneck lies in data-centric autonomous driving technology. Recent advancement in AD simulation, closed-loop model training, and AD big data engine have gained some valuable experience. However, there is a lack of systematic knowledge and deep understanding regarding how to build efficient data-centric AD technology for AD algorithm self-evolution and better AD big data accumulation. To fill in the identified research gaps, this article will closely focus on reviewing the state-of-the-art data-driven autonomous driving technologies, with an emphasis on the comprehensive taxonomy of autonomous driving datasets characterized by milestone generations, key features, data acquisition settings, etc. Furthermore, we provide a systematic review of the existing benchmark closed-loop AD big data pipelines from the industrial frontier, including the procedure of closed-loop frameworks, key technologies, and empirical studies. Finally, the future directions, potential applications, limitations and concerns are discussed to arouse efforts from both academia and industry for promoting the further development of autonomous driving. The project repository is available at: https://github.com/LincanLi98/Awesome-Data-Centric-Autonomous-Driving.
Paper Structure (11 sections, 5 figures, 2 tables)

This paper contains 11 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: This work provides a systematic survey of the benchmark closed-loop data-driven autonomous driving technologies from various aspects, the organization of this paper is illustrated above.
  • Figure 2: The comprehensive illustration of open-source autonomous driving datasets' development characterized by milestone generations. We emphasize the sensor modality, suitable tasks, places of dataset collection, and related challenges.
  • Figure 3: The workflow illustration of two pioneer data-driven closed-loop autonomous driving pipelines: NVIDIA's MagLev AV Platform (left) and Tesla AutoPilot Data Platform (right).
  • Figure 4: The detailed workflows of mainstream AD data labeling pipelines. AD data labeling is usually task/model specific, with pre-defined requirements. It's usually not a one-time task, but a cyclical procedure.
  • Figure 5: The real-world application diagram of advanced closed-loop autonomous driving big data platform.