Table of Contents
Fetching ...

Risk-Aware World Model Predictive Control for Generalizable End-to-End Autonomous Driving

Jiangxin Sun, Feng Xue, Teng Long, Chang Liu, Jian-Fang Hu, Wei-Shi Zheng, Nicu Sebe

TL;DR

A unified framework named Risk-aware World Model Predictive Control (RaWMPC) is proposed to address the generalization dilemma through robust control, without reliance on expert demonstrations, and outperforms state-of-the-art methods in both in-distribution and out-of-distribution scenarios.

Abstract

With advances in imitation learning (IL) and large-scale driving datasets, end-to-end autonomous driving (E2E-AD) has made great progress recently. Currently, IL-based methods have become a mainstream paradigm: models rely on standard driving behaviors given by experts, and learn to minimize the discrepancy between their actions and expert actions. However, this objective of "only driving like the expert" suffers from limited generalization: when encountering rare or unseen long-tail scenarios outside the distribution of expert demonstrations, models tend to produce unsafe decisions in the absence of prior experience. This raises a fundamental question: Can an E2E-AD system make reliable decisions without any expert action supervision? Motivated by this, we propose a unified framework named Risk-aware World Model Predictive Control (RaWMPC) to address this generalization dilemma through robust control, without reliance on expert demonstrations. Practically, RaWMPC leverages a world model to predict the consequences of multiple candidate actions and selects low-risk actions through explicit risk evaluation. To endow the world model with the ability to predict the outcomes of risky driving behaviors, we design a risk-aware interaction strategy that systematically exposes the world model to hazardous behaviors, making catastrophic outcomes predictable and thus avoidable. Furthermore, to generate low-risk candidate actions at test time, we introduce a self-evaluation distillation method to distill riskavoidance capabilities from the well-trained world model into a generative action proposal network without any expert demonstration. Extensive experiments show that RaWMPC outperforms state-of-the-art methods in both in-distribution and out-of-distribution scenarios, while providing superior decision interpretability.

Risk-Aware World Model Predictive Control for Generalizable End-to-End Autonomous Driving

TL;DR

A unified framework named Risk-aware World Model Predictive Control (RaWMPC) is proposed to address the generalization dilemma through robust control, without reliance on expert demonstrations, and outperforms state-of-the-art methods in both in-distribution and out-of-distribution scenarios.

Abstract

With advances in imitation learning (IL) and large-scale driving datasets, end-to-end autonomous driving (E2E-AD) has made great progress recently. Currently, IL-based methods have become a mainstream paradigm: models rely on standard driving behaviors given by experts, and learn to minimize the discrepancy between their actions and expert actions. However, this objective of "only driving like the expert" suffers from limited generalization: when encountering rare or unseen long-tail scenarios outside the distribution of expert demonstrations, models tend to produce unsafe decisions in the absence of prior experience. This raises a fundamental question: Can an E2E-AD system make reliable decisions without any expert action supervision? Motivated by this, we propose a unified framework named Risk-aware World Model Predictive Control (RaWMPC) to address this generalization dilemma through robust control, without reliance on expert demonstrations. Practically, RaWMPC leverages a world model to predict the consequences of multiple candidate actions and selects low-risk actions through explicit risk evaluation. To endow the world model with the ability to predict the outcomes of risky driving behaviors, we design a risk-aware interaction strategy that systematically exposes the world model to hazardous behaviors, making catastrophic outcomes predictable and thus avoidable. Furthermore, to generate low-risk candidate actions at test time, we introduce a self-evaluation distillation method to distill riskavoidance capabilities from the well-trained world model into a generative action proposal network without any expert demonstration. Extensive experiments show that RaWMPC outperforms state-of-the-art methods in both in-distribution and out-of-distribution scenarios, while providing superior decision interpretability.
Paper Structure (37 sections, 14 equations, 6 figures, 10 tables)

This paper contains 37 sections, 14 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Comparison between existing E2E-AD methods and RaWMPC. The first row shows the predicted trajectories, and the second row compares the core workflows. Black arrows denote test-time execution, while pink arrows indicate training-only steps. The comparison shows that prior methods often omit explicit hazard modeling and may trigger traffic violations, whereas RaWMPC uses a risk-aware world model to evaluate action consequences and select safe, compliant actions in critical scenes.
  • Figure 2: Overview of RaWMPC. Multi-view images $\mathbf{I}_t$, ego state $\mathbf{M}_t$, and candidate action sequences $\{\mathbf{A}^{n}_{t:t+H-1}\}_{n=1}^{N}$ are encoded and rolled out by a world model over horizon $H$. Three decoders predict semantic segmentation, semantic-guided traffic events, and future ego states, enabling action evaluation for predictive control. Training combines offline warm-up on logged trajectories with online simulator interaction using world-model-guided exploration.
  • Figure 3: Different action-selection ranges under three driving modes in online simulator interaction. Red denotes high cost and green denotes low cost. rand samples uniformly from all candidates, bad samples from the high-cost region, and good samples from the low-cost one.
  • Figure 4: Self-Evaluation Distillation for Policy Learning. A cVAE is trained with RaWMPC-scored actions in a contrastive manner, pulling the condition prior toward positives and pushing it away from negatives. The well-trained decoder serves as the test-time action proposer.
  • Figure 5: Qualitative comparison under weather-induced domain shift (Sunny-only$\rightarrow$Rainy). All methods are trained on Sunny-only data and evaluated in Rainy conditions. LAW LAW misses the lead vehicle, causing a severe frontal collision. WoTE WoTE and SimLingo Simlingo reduce severity by evasive maneuvers but still collide due to degraded perception–decision reliability and weak safety-margin enforcement. RaWMPC avoids collisions by selecting the minimum-risk predictive-control action under uncertainty.
  • ...and 1 more figures