Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving
Lingyu Xiao, Jiang-Jiang Liu, Sen Yang, Xiaofan Li, Xiaoqing Ye, Wankou Yang, Jingdong Wang
TL;DR
This work tackles the challenge of deriving robust actions from autoregressive world models in autonomous driving by addressing insufficient uncertainty modeling and self-delusion. It introduces LatentDriver, which combines a Latent World Model (LWM) with a Multi Probabilistic Planner (MPP) to model environment transitions and ego actions as a mixture distribution, using an intermediate action to mitigate self-delusion. The approach employs Gaussian mixtures for lateral actions and a Laplace model for yaw, with a bi-directional, stochastic interaction between world modeling and planning, and trains with a joint loss that blends world-model and planner objectives. On the Waymax close-loop benchmark, LatentDriver achieves expert-level performance and outperforms state-of-the-art RL and IL baselines, validating its effectiveness for robust decision-making under uncertainty. The method offers practical impact by enabling better handling of long-tail driving scenarios while maintaining efficient, joint training and inference.
Abstract
The autoregressive world model exhibits robust generalization capabilities in vectorized scene understanding but encounters difficulties in deriving actions due to insufficient uncertainty modeling and self-delusion. In this paper, we explore the feasibility of deriving decisions from an autoregressive world model by addressing these challenges through the formulation of multiple probabilistic hypotheses. We propose LatentDriver, a framework models the environment's next states and the ego vehicle's possible actions as a mixture distribution, from which a deterministic control signal is then derived. By incorporating mixture modeling, the stochastic nature of decisionmaking is captured. Additionally, the self-delusion problem is mitigated by providing intermediate actions sampled from a distribution to the world model. Experimental results on the recently released close-loop benchmark Waymax demonstrate that LatentDriver surpasses state-of-the-art reinforcement learning and imitation learning methods, achieving expert-level performance. The code and models will be made available at https://github.com/Sephirex-X/LatentDriver.
