Table of Contents
Fetching ...

Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving

Lingyu Xiao, Jiang-Jiang Liu, Sen Yang, Xiaofan Li, Xiaoqing Ye, Wankou Yang, Jingdong Wang

TL;DR

This work tackles the challenge of deriving robust actions from autoregressive world models in autonomous driving by addressing insufficient uncertainty modeling and self-delusion. It introduces LatentDriver, which combines a Latent World Model (LWM) with a Multi Probabilistic Planner (MPP) to model environment transitions and ego actions as a mixture distribution, using an intermediate action to mitigate self-delusion. The approach employs Gaussian mixtures for lateral actions and a Laplace model for yaw, with a bi-directional, stochastic interaction between world modeling and planning, and trains with a joint loss that blends world-model and planner objectives. On the Waymax close-loop benchmark, LatentDriver achieves expert-level performance and outperforms state-of-the-art RL and IL baselines, validating its effectiveness for robust decision-making under uncertainty. The method offers practical impact by enabling better handling of long-tail driving scenarios while maintaining efficient, joint training and inference.

Abstract

The autoregressive world model exhibits robust generalization capabilities in vectorized scene understanding but encounters difficulties in deriving actions due to insufficient uncertainty modeling and self-delusion. In this paper, we explore the feasibility of deriving decisions from an autoregressive world model by addressing these challenges through the formulation of multiple probabilistic hypotheses. We propose LatentDriver, a framework models the environment's next states and the ego vehicle's possible actions as a mixture distribution, from which a deterministic control signal is then derived. By incorporating mixture modeling, the stochastic nature of decisionmaking is captured. Additionally, the self-delusion problem is mitigated by providing intermediate actions sampled from a distribution to the world model. Experimental results on the recently released close-loop benchmark Waymax demonstrate that LatentDriver surpasses state-of-the-art reinforcement learning and imitation learning methods, achieving expert-level performance. The code and models will be made available at https://github.com/Sephirex-X/LatentDriver.

Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving

TL;DR

This work tackles the challenge of deriving robust actions from autoregressive world models in autonomous driving by addressing insufficient uncertainty modeling and self-delusion. It introduces LatentDriver, which combines a Latent World Model (LWM) with a Multi Probabilistic Planner (MPP) to model environment transitions and ego actions as a mixture distribution, using an intermediate action to mitigate self-delusion. The approach employs Gaussian mixtures for lateral actions and a Laplace model for yaw, with a bi-directional, stochastic interaction between world modeling and planning, and trains with a joint loss that blends world-model and planner objectives. On the Waymax close-loop benchmark, LatentDriver achieves expert-level performance and outperforms state-of-the-art RL and IL baselines, validating its effectiveness for robust decision-making under uncertainty. The method offers practical impact by enabling better handling of long-tail driving scenarios while maintaining efficient, joint training and inference.

Abstract

The autoregressive world model exhibits robust generalization capabilities in vectorized scene understanding but encounters difficulties in deriving actions due to insufficient uncertainty modeling and self-delusion. In this paper, we explore the feasibility of deriving decisions from an autoregressive world model by addressing these challenges through the formulation of multiple probabilistic hypotheses. We propose LatentDriver, a framework models the environment's next states and the ego vehicle's possible actions as a mixture distribution, from which a deterministic control signal is then derived. By incorporating mixture modeling, the stochastic nature of decisionmaking is captured. Additionally, the self-delusion problem is mitigated by providing intermediate actions sampled from a distribution to the world model. Experimental results on the recently released close-loop benchmark Waymax demonstrate that LatentDriver surpasses state-of-the-art reinforcement learning and imitation learning methods, achieving expert-level performance. The code and models will be made available at https://github.com/Sephirex-X/LatentDriver.
Paper Structure (30 sections, 8 equations, 4 figures, 4 tables)

This paper contains 30 sections, 8 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Different designs of world model integration. Dashed arrows indicate the absence of gradient. (a) Treats the world model as a realistic simulator and selects the best action from multiple planners (actions). (b) Directly derives actions from the world model's latent space. (c) Our method models the environment’s next states and the ego vehicle’s next possible actions as a mixture distribution and derives the ultimate action from it.
  • Figure 2: Overall pipeline for LatentDriver. The scheme is in three steps. The class token from scene encoder is first fed into a Multiple Probabilistic Planner (MPP) which will generate an intermediate action distribution $\bar{A}^I_{1:t}$ from its $I$ layer. Then the Latent World Model (LWM) is introduced to generate latent state distribution $\bar{\mathbf{s}}_{t+1}$ based on $\mathbf{h}_{1:t}$ and $\bar{A}^I_{1:t}$. Lastly, the final execution signal is generated by the $J$ layer output from planner aid by $\bar{\mathbf{s}}_{t+1}$.
  • Figure 3: The percentages of the episode number for each driving scenario in the training and validation sets.
  • Figure 4: Visualization results of LatentDriver against other three methods in four driving scenarios. For a detailed explanation, please refer to Section \ref{['sec.vis']}.