Latent-WAM: Latent World Action Modeling for End-to-End Autonomous Driving

Linbo Wang; Yupeng Zheng; Qiang Chen; Shiwei Li; Yichen Zhang; Zebin Xing; Qichao Zhang; Xiang Li; Deheng Qian; Pengxuan Yang; Yihang Dong; Ce Hao; Xiaoqing Ye; Junyu han; Yifeng Pan; Dongbin Zhao

Latent-WAM: Latent World Action Modeling for End-to-End Autonomous Driving

Linbo Wang, Yupeng Zheng, Qiang Chen, Shiwei Li, Yichen Zhang, Zebin Xing, Qichao Zhang, Xiang Li, Deheng Qian, Pengxuan Yang, Yihang Dong, Ce Hao, Xiaoqing Ye, Junyu han, Yifeng Pan, Dongbin Zhao

Abstract

We introduce Latent-WAM, an efficient end-to-end autonomous driving framework that achieves strong trajectory planning through spatially-aware and dynamics-informed latent world representations. Existing world-model-based planners suffer from inadequately compressed representations, limited spatial understanding, and underutilized temporal dynamics, resulting in sub-optimal planning under constrained data and compute budgets. Latent-WAM addresses these limitations with two core modules: a Spatial-Aware Compressive World Encoder (SCWE) that distills geometric knowledge from a foundation model and compresses multi-view images into compact scene tokens via learnable queries, and a Dynamic Latent World Model (DLWM) that employs a causal Transformer to autoregressively predict future world status conditioned on historical visual and motion representations. Extensive experiments on NAVSIM v2 and HUGSIM demonstrate new state-of-the-art results: 89.3 EPDMS on NAVSIM v2 and 28.9 HD-Score on HUGSIM, surpassing the best prior perception-free method by 3.2 EPDMS with significantly less training data and a compact 104M-parameter model.

Latent-WAM: Latent World Action Modeling for End-to-End Autonomous Driving

Abstract

Latent-WAM: Latent World Action Modeling for End-to-End Autonomous Driving

Abstract

Paper Structure

Table of Contents

Figures (15)