Table of Contents
Fetching ...

Vehicle Dynamics Embedded World Models for Autonomous Driving

Huiqian Li, Wei Pan, Haodong Zhang, Jin Huang, Zhihua Zhong

TL;DR

This paper tackles the robustness problem of world-model-based autonomous driving under varying ego-vehicle dynamics. It introduces Vehicle Dynamics embedded Dreamer (VDD), which decouples ego dynamics from environmental dynamics using a hierarchical context-aware latent state space, and augments learning with Policy Adjustment during Deployment (PAD) and Policy Augmentation during Training (PAT). Empirical results in MetaDrive and CARLA show that VDD improves driving performance and robustness to vehicle-dynamics shifts, outperforming baselines like DreamerV3 in several scenarios, especially on complex roundabouts. The work also analyzes ablations and sensitivity, highlighting the importance of trajectory imagination, reachability constraints, and context-aware planning for cross-vehicle generalization. Limitations include mixed results on real-world datasets and the need for offline pretraining and sim-to-real strategies, with future work aimed at continual online dynamics learning and fault-tolerant decision-making.

Abstract

World models have gained significant attention as a promising approach for autonomous driving. By emulating human-like perception and decision-making processes, these models can predict and adapt to dynamic environments. Existing methods typically map high-dimensional observations into compact latent spaces and learn optimal policies within these latent representations. However, prior work usually jointly learns ego-vehicle dynamics and environmental transition dynamics from the image input, leading to inefficiencies and a lack of robustness to variations in vehicle dynamics. To address these issues, we propose the Vehicle Dynamics embedded Dreamer (VDD) method, which decouples the modeling of ego-vehicle dynamics from environmental transition dynamics. This separation allows the world model to generalize effectively across vehicles with diverse parameters. Additionally, we introduce two strategies to further enhance the robustness of the learned policy: Policy Adjustment during Deployment (PAD) and Policy Augmentation during Training (PAT). Comprehensive experiments in simulated environments demonstrate that the proposed model significantly improves both driving performance and robustness to variations in vehicle dynamics, outperforming existing approaches.

Vehicle Dynamics Embedded World Models for Autonomous Driving

TL;DR

This paper tackles the robustness problem of world-model-based autonomous driving under varying ego-vehicle dynamics. It introduces Vehicle Dynamics embedded Dreamer (VDD), which decouples ego dynamics from environmental dynamics using a hierarchical context-aware latent state space, and augments learning with Policy Adjustment during Deployment (PAD) and Policy Augmentation during Training (PAT). Empirical results in MetaDrive and CARLA show that VDD improves driving performance and robustness to vehicle-dynamics shifts, outperforming baselines like DreamerV3 in several scenarios, especially on complex roundabouts. The work also analyzes ablations and sensitivity, highlighting the importance of trajectory imagination, reachability constraints, and context-aware planning for cross-vehicle generalization. Limitations include mixed results on real-world datasets and the need for offline pretraining and sim-to-real strategies, with future work aimed at continual online dynamics learning and fault-tolerant decision-making.

Abstract

World models have gained significant attention as a promising approach for autonomous driving. By emulating human-like perception and decision-making processes, these models can predict and adapt to dynamic environments. Existing methods typically map high-dimensional observations into compact latent spaces and learn optimal policies within these latent representations. However, prior work usually jointly learns ego-vehicle dynamics and environmental transition dynamics from the image input, leading to inefficiencies and a lack of robustness to variations in vehicle dynamics. To address these issues, we propose the Vehicle Dynamics embedded Dreamer (VDD) method, which decouples the modeling of ego-vehicle dynamics from environmental transition dynamics. This separation allows the world model to generalize effectively across vehicles with diverse parameters. Additionally, we introduce two strategies to further enhance the robustness of the learned policy: Policy Adjustment during Deployment (PAD) and Policy Augmentation during Training (PAT). Comprehensive experiments in simulated environments demonstrate that the proposed model significantly improves both driving performance and robustness to variations in vehicle dynamics, outperforming existing approaches.

Paper Structure

This paper contains 33 sections, 22 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: Illustration of the research motivation of this paper.
  • Figure 2: Recurrent state space model (top) and hierarchical context-aware recurrent state space model (bottom).
  • Figure 3: The world model learning process (left) and behavior learning process (right) of the proposed VDD model. The model receives bird’s-eye view (BEV) images as sensory inputs $o_t$, encoding them into discrete stochastic representations $z_t$. A sequential model with recurrent state $h_t$ then predicts the sequence of these representations based on the previous dynamics state of the ego vehicle, $s^i_{t-1}$. The actor and critic modules predict goals $g_t$ and values $v_t$, learning from trajectories of abstract representations generated by the world model. The controller of the ego vehicle subsequently produces the action $a_t$.
  • Figure 4: Policy adjustment (top) and policy augmentation (bottom) strategies to improve the robustness to vehicle dynamics shifts.
  • Figure 5: Maps visualization (left column) and observations (central and right columns) used in the experiments. The first two rows show the visualization of roundabout (top row) and example from the nuPlan dataset (middle row) in the MetaDrive simulator. Each BEV observation frame consists of two channels: one representing the road geometry and navigation information (central column) and the other representing surrounding vehicles (right column). The bottom row shows the visualization of the intersection in the CARLA simulator. Each observation frame consists of a route map image and a front camera image.
  • ...and 9 more figures