Table of Contents
Fetching ...

Unlocking Efficient Vehicle Dynamics Modeling via Analytic World Models

Asen Nachkov, Danda Pani Paudel, Jan-Nico Zaech, Davide Scaramuzza, Luc Van Gool

TL;DR

This work extends differentiable simulation from policy learning to world modeling by introducing Analytic World Models (AWMs) that predict, prescribe, and counterfact actions in autonomous driving. By embedding three predictor tasks—relative odometry, optimal planners, and inverse optimal state estimation—within an end-to-end differentiable graph built on Waymax, the approach enables efficient learning with backpropagation through the environment dynamics. The AWMs, trained alongside a policy under Analytic Policy Gradients, achieve stronger reactive performance, accurate imagined futures, and useful confidence signals, and they enable model-based action selection via MPC. Overall, the method demonstrates that DiffSim can substantially enhance decision-making beyond reactive control in autonomous driving, with potential for broader, real-time, end-to-end world modeling.

Abstract

Differentiable simulators represent an environment's dynamics as a differentiable function. Within robotics and autonomous driving, this property is used in Analytic Policy Gradients (APG), which relies on backpropagating through the dynamics to train accurate policies for diverse tasks. Here we show that differentiable simulation also has an important role in world modeling, where it can impart predictive, prescriptive, and counterfactual capabilities to an agent. Specifically, we design three novel task setups in which the differentiable dynamics are combined within an end-to-end computation graph not with a policy, but a state predictor. This allows us to learn relative odometry, optimal planners, and optimal inverse states. We collectively call these predictors Analytic World Models (AWMs) and demonstrate how differentiable simulation enables their efficient, end-to-end learning. In autonomous driving scenarios, they have broad applicability and can augment an agent's decision-making beyond reactive control.

Unlocking Efficient Vehicle Dynamics Modeling via Analytic World Models

TL;DR

This work extends differentiable simulation from policy learning to world modeling by introducing Analytic World Models (AWMs) that predict, prescribe, and counterfact actions in autonomous driving. By embedding three predictor tasks—relative odometry, optimal planners, and inverse optimal state estimation—within an end-to-end differentiable graph built on Waymax, the approach enables efficient learning with backpropagation through the environment dynamics. The AWMs, trained alongside a policy under Analytic Policy Gradients, achieve stronger reactive performance, accurate imagined futures, and useful confidence signals, and they enable model-based action selection via MPC. Overall, the method demonstrates that DiffSim can substantially enhance decision-making beyond reactive control in autonomous driving, with potential for broader, real-time, end-to-end world modeling.

Abstract

Differentiable simulators represent an environment's dynamics as a differentiable function. Within robotics and autonomous driving, this property is used in Analytic Policy Gradients (APG), which relies on backpropagating through the dynamics to train accurate policies for diverse tasks. Here we show that differentiable simulation also has an important role in world modeling, where it can impart predictive, prescriptive, and counterfactual capabilities to an agent. Specifically, we design three novel task setups in which the differentiable dynamics are combined within an end-to-end computation graph not with a policy, but a state predictor. This allows us to learn relative odometry, optimal planners, and optimal inverse states. We collectively call these predictors Analytic World Models (AWMs) and demonstrate how differentiable simulation enables their efficient, end-to-end learning. In autonomous driving scenarios, they have broad applicability and can augment an agent's decision-making beyond reactive control.

Paper Structure

This paper contains 11 sections, 5 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Differentiable simulation for world modeling. Previously, differentiable simulation has been used to train controllers using analytic policy gradients (bottom). Our contribution is in applying it for learning relative odometry, state planning, and inverse state estimation (top).
  • Figure 2: The benefits of differentiable simulation. Methods that do not use DiffSim, e.g. behavior cloning, shown in red, are trained to minimize a loss in the action space. If the dynamics are nonlinear (here with a jump at the action $\bar{a}$), the distribution of the outcome could be bad. DiffSim-based methods (blue) minimize a loss directly in the outcome space and the learned action distributions are tighter.
  • Figure 3: Predictions from the relative odometry. We condition the ego-agent (blue) to go offroad, turn, or accelerate. The imagined trajectories, shown as scattered colored circles, represent the imagined future locations of the ego-vehicle in the next 1 second, plotted in different colors at the times 1s, 2s, ..., 7s throughout the episode. They align with the actual realized trajectory, which implies that the agent can imagine its future motion accurately. The ground truth historic trajectory is added for reference.
  • Figure 4: Trajectories obtained by using the optimal planner. They are realistic and resemble those from the policy. Training such planners is possible due to the analytically available dynamics and inverse kinematics, to drive the action selection.
  • Figure 5: Realized trajectory colored according to the log-norm of the predicted inverse state displacements. Since the ego-vehicle drives faster than the expert, the norm of the optimal inverse state predictions gradually increases.