Table of Contents
Fetching ...

Dream to Drive with Predictive Individual World Model

Yinfeng Gao, Qichao Zhang, Da-wei Ding, Dongbin Zhao

TL;DR

This work tackles reactive decision-making in urban autonomous driving where other road users’ intentions are unknown. It introduces Predictive Individual World Model (PIWM), which encodes driving scenes at an individual-vehicle level with branched encoders and self-attention to model interactions, and learns intention-aware latent states through a trajectory-prediction objective. A separate behavior model is trained within the world model’s imagination, using cross-attention to fuse ego and nearby vehicle states for discrete speed control. Empirical results on INTERACTION-derived scenarios show that PIWM improves learning efficiency and outperforms both model-free baselines and DreamerV3 across small- and large-scale benchmarks, with ablations confirming the value of individual modeling and interactive prediction. The approach enhances interpretability by decoding predicted trajectories and paves the way for scalable, intention-aware autonomous driving in complex traffic.

Abstract

It is still a challenging topic to make reactive driving behaviors in complex urban environments as road users' intentions are unknown. Model-based reinforcement learning (MBRL) offers great potential to learn a reactive policy by constructing a world model that can provide informative states and imagination training. However, a critical limitation in relevant research lies in the scene-level reconstruction representation learning, which may overlook key interactive vehicles and hardly model the interactive features among vehicles and their long-term intentions. Therefore, this paper presents a novel MBRL method with a predictive individual world model (PIWM) for autonomous driving. PIWM describes the driving environment from an individual-level perspective and captures vehicles' interactive relations and their intentions via trajectory prediction task. Meanwhile, a behavior policy is learned jointly with PIWM. It is trained in PIWM's imagination and effectively navigates in the urban driving scenes leveraging intention-aware latent states. The proposed method is trained and evaluated on simulation environments built upon real-world challenging interactive scenarios. Compared with popular model-free and state-of-the-art model-based reinforcement learning methods, experimental results show that the proposed method achieves the best performance in terms of safety and efficiency.

Dream to Drive with Predictive Individual World Model

TL;DR

This work tackles reactive decision-making in urban autonomous driving where other road users’ intentions are unknown. It introduces Predictive Individual World Model (PIWM), which encodes driving scenes at an individual-vehicle level with branched encoders and self-attention to model interactions, and learns intention-aware latent states through a trajectory-prediction objective. A separate behavior model is trained within the world model’s imagination, using cross-attention to fuse ego and nearby vehicle states for discrete speed control. Empirical results on INTERACTION-derived scenarios show that PIWM improves learning efficiency and outperforms both model-free baselines and DreamerV3 across small- and large-scale benchmarks, with ablations confirming the value of individual modeling and interactive prediction. The approach enhances interpretability by decoding predicted trajectories and paves the way for scalable, intention-aware autonomous driving in complex traffic.

Abstract

It is still a challenging topic to make reactive driving behaviors in complex urban environments as road users' intentions are unknown. Model-based reinforcement learning (MBRL) offers great potential to learn a reactive policy by constructing a world model that can provide informative states and imagination training. However, a critical limitation in relevant research lies in the scene-level reconstruction representation learning, which may overlook key interactive vehicles and hardly model the interactive features among vehicles and their long-term intentions. Therefore, this paper presents a novel MBRL method with a predictive individual world model (PIWM) for autonomous driving. PIWM describes the driving environment from an individual-level perspective and captures vehicles' interactive relations and their intentions via trajectory prediction task. Meanwhile, a behavior policy is learned jointly with PIWM. It is trained in PIWM's imagination and effectively navigates in the urban driving scenes leveraging intention-aware latent states. The proposed method is trained and evaluated on simulation environments built upon real-world challenging interactive scenarios. Compared with popular model-free and state-of-the-art model-based reinforcement learning methods, experimental results show that the proposed method achieves the best performance in terms of safety and efficiency.

Paper Structure

This paper contains 36 sections, 13 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Comparison of Dreamerhafner2023dreamerv3 and our method. Considering a complex driving scenario with an ego vehicle and several social vehicles. Dreamer learns a scene-level world model, its representational learning focuses on reconstructing the current observation. The behavior model learns to operate on the mingled state, depicted as a grey circle. In contrast, our method develops the world model in an individual-level framework, where each vehicle in the scene is classified and modeled separately with branched networks and owns a unique state, symbolized by a numbered colored circle. We further improve the individual-level world model by explicitly modeling the relations between vehicles, and replacing the reconstruction with trajectory prediction to capture the latent intentions or motion trends of vehicles.
  • Figure 2: The detailed structure of PIWM and Behavior Model. The modules represented by solid lines are branched-only, the other modules represented by dash lines are shared between branches. Note that the gradients of actor and critic are stopped from flowing backward through latent states, which makes the representation learning purely happen in the world model learning phase. Network modules are all formed as multi-layer perceptrons (MLPs) since no image observations are considered.
  • Figure 3: Typical scenarios of the experiments. Where the ego vehicle is red and other social vehicles are blue. The green line indicates the ego's predefined route and the red line indicates the ego's heading.
  • Figure 4: The workflow of the I-SIM simulator. The simulation's visualization can be exhibited online to help users better understand the driving environment and ego's behavior.
  • Figure 5: Training curves of our proposed method PIWM and other learning-based baselines. PIWM shows superior performance in terms of sample efficiency and final performance on both benchmark scenarios.
  • ...and 4 more figures