Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges

Rongxiang Zeng; Yongqi Dong

Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges

Rongxiang Zeng, Yongqi Dong

TL;DR

A unifying latent-space framework is proposed that synthesizes recent progress in world models for automated driving and organizes the design space by the target and form of latent representations and by structural priors for geometry, topology, and semantics.

Abstract

Emerging generative world models and vision-language-action (VLA) systems are rapidly reshaping automated driving by enabling scalable simulation, long-horizon forecasting, and capability-rich decision making. Across these directions, latent representations serve as the central computational substrate: they compress high-dimensional multi-sensor observations, enable temporally coherent rollouts, and provide interfaces for planning, reasoning, and controllable generation. This paper proposes a unifying latent-space framework that synthesizes recent progress in world models for automated driving. The framework organizes the design space by the target and form of latent representations (latent worlds, latent actions, latent generators; continuous states, discrete tokens, and hybrids) and by structural priors for geometry, topology, and semantics. Building on this taxonomy, the paper articulates five cross-cutting internal mechanics (i.e, structural isomorphism, long-horizon temporal stability, semantic and reasoning alignment, value-aligned objectives and post-training, as well as adaptive computation and deliberation) and connects these design choices to robustness, generalization, and deployability. The work also proposes concrete evaluation prescriptions, including a closed-loop metric suite and a resource-aware deliberation cost, designed to reduce the open-loop / closed-loop mismatch. Finally, the paper identifies actionable research directions toward advancing latent world model for decision-ready, verifiable, and resource-efficient automated driving.

Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges

TL;DR

Abstract

Paper Structure (32 sections, 4 equations, 6 figures, 2 tables)

This paper contains 32 sections, 4 equations, 6 figures, 2 tables.

Introduction
Taxonomy of World Models for Automated Driving
Spatiotemporal World Modeling and Neural Simulation
Latent-Centric Planning and Reinforcement Learning
Generative Data Synthesis and Scene Editing
Cognitive Reasoning and Latent Chain-of-Thought
INTERNAL MECHANICS: STRUCTURE, ALIGNMENT, AND DYNAMICS IN LATENT REPRESENTATIONS
Structural Isomorphism and Geometric Priors
Temporal Dynamics and Long-Horizon Stability
Semantic and Reasoning Alignment
Value-Aligned Objectives and Post-Training
Adaptive Computation and Deliberation in Latent Rollouts
Evaluation Standards: Metrics and Benchmarks
Open-Loop Fidelity and Closed-Loop Stability
Benchmarks and Simulation Environments
...and 17 more sections

Figures (6)

Figure 1: A visual roadmap visualization of the paper.
Figure 2: Taxonomy of world models for automated driving: A conceptual overview of Neural Simulation, Latent Planning, Data Synthesis, and Cognitive Reasoning within a unified latent-centric framework.
Figure 3: Internal mechanics of latent world models for automated driving.
Figure 4: Evaluation paradigms for latent world models in automated driving: open-loop evaluation (left), and closed-loop evaluation (right).
Figure 5: Open challenges for latent world models in autonomous driving.
...and 1 more figures

Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges

TL;DR

Abstract

Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges

Authors

TL;DR

Abstract

Table of Contents

Figures (6)