Table of Contents
Fetching ...

The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey

Sifan Tu, Xin Zhou, Dingkang Liang, Xingyu Jiang, Yumeng Zhang, Xiaofan Li, Xiang Bai

TL;DR

This survey assesses Driving World Models (DWM) for autonomous driving, focusing on predicting scene evolution from historical observations and actions across 2D, 3D, and scene-free paradigms. It systematically categorizes methods by predicted modalities, reviews core architectures (diffusion, transformers, latent-state models), and outlines applications in simulation, data generation, anticipative driving, and 4D pre-training. The authors compile high-impact datasets and task-specific metrics, discuss limitations (data scarcity, efficiency, reliability, multi-sensor fusion, and adversarial risks), and propose future directions such as unified tasks and language-assisted supervision. The goal is to clarify progress, identify gaps, and guide researchers toward broader, safer adoption of DWM in real-world autonomous driving.

Abstract

Driving World Model (DWM), which focuses on predicting scene evolution during the driving process, has emerged as a promising paradigm in pursuing autonomous driving. These methods enable autonomous driving systems to better perceive, understand, and interact with dynamic driving environments. In this survey, we provide a comprehensive overview of the latest progress in DWM. We categorize existing approaches based on the modalities of the predicted scenes and summarize their specific contributions to autonomous driving. In addition, high-impact datasets and various metrics tailored to different tasks within the scope of DWM research are reviewed. Finally, we discuss the potential limitations of current research and propose future directions. This survey provides valuable insights into the development and application of DWM, fostering its broader adoption in autonomous driving. The relevant papers are collected at https://github.com/LMD0311/Awesome-World-Model.

The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey

TL;DR

This survey assesses Driving World Models (DWM) for autonomous driving, focusing on predicting scene evolution from historical observations and actions across 2D, 3D, and scene-free paradigms. It systematically categorizes methods by predicted modalities, reviews core architectures (diffusion, transformers, latent-state models), and outlines applications in simulation, data generation, anticipative driving, and 4D pre-training. The authors compile high-impact datasets and task-specific metrics, discuss limitations (data scarcity, efficiency, reliability, multi-sensor fusion, and adversarial risks), and propose future directions such as unified tasks and language-assisted supervision. The goal is to clarify progress, identify gaps, and guide researchers toward broader, safer adoption of DWM in real-world autonomous driving.

Abstract

Driving World Model (DWM), which focuses on predicting scene evolution during the driving process, has emerged as a promising paradigm in pursuing autonomous driving. These methods enable autonomous driving systems to better perceive, understand, and interact with dynamic driving environments. In this survey, we provide a comprehensive overview of the latest progress in DWM. We categorize existing approaches based on the modalities of the predicted scenes and summarize their specific contributions to autonomous driving. In addition, high-impact datasets and various metrics tailored to different tasks within the scope of DWM research are reviewed. Finally, we discuss the potential limitations of current research and propose future directions. This survey provides valuable insights into the development and application of DWM, fostering its broader adoption in autonomous driving. The relevant papers are collected at https://github.com/LMD0311/Awesome-World-Model.

Paper Structure

This paper contains 15 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: DWM utilizes historical observations as the primary input to predict future scene evolutions. Additionally, many methods also incorporate condition inputs or response outputs.
  • Figure 2: The various applications of DWM. (a) respond to various types of instructions and faithfully simulate the corresponding scenarios. (b) generate diverse data with the same annotation. (c) optimize planning by predicting future scenes. (d) enhance downstream task performance and reduce reliance on annotations through 4D pre-training.