Table of Contents
Fetching ...

Autonomous Vehicle Path Planning by Searching With Differentiable Simulation

Asen Nachkov, Jan-Nico Zaech, Danda Pani Paudel, Xi Wang, Luc Van Gool

TL;DR

This work addresses planning for autonomous driving when all core modules must be learned by proposing Differentiable Simulation for Search (DSS), which leverages the differentiable Waymax simulator as both a next-state predictor and a critic to enable gradient-based search over imagined action sequences. A classifier-guided action selection component enables approximating non-differentiable events such as collisions and offroad occurrences, allowing smooth optimization of actions at test time. Empirically, DSS demonstrates large gains in tracking accuracy (ADE) and path-planning metrics compared with strong baselines and state-of-the-art methods on Waymax/WOMD scenarios, while maintaining a practical compute footprint (~4 seconds per scenario on a RTX3090 for long horizons). These results highlight the practical potential of test-time planning with differentiable dynamics for safer, more human-like autonomous driving behavior, with scope for extension to additional simulators and multi-agent dynamics.

Abstract

Planning allows an agent to safely refine its actions before executing them in the real world. In autonomous driving, this is crucial to avoid collisions and navigate in complex, dense traffic scenarios. One way to plan is to search for the best action sequence. However, this is challenging when all necessary components - policy, next-state predictor, and critic - have to be learned. Here we propose Differentiable Simulation for Search (DSS), a framework that leverages the differentiable simulator Waymax as both a next state predictor and a critic. It relies on the simulator's hardcoded dynamics, making state predictions highly accurate, while utilizing the simulator's differentiability to effectively search across action sequences. Our DSS agent optimizes its actions using gradient descent over imagined future trajectories. We show experimentally that DSS - the combination of planning gradients and stochastic search - significantly improves tracking and path planning accuracy compared to sequence prediction, imitation learning, model-free RL, and other planning methods.

Autonomous Vehicle Path Planning by Searching With Differentiable Simulation

TL;DR

This work addresses planning for autonomous driving when all core modules must be learned by proposing Differentiable Simulation for Search (DSS), which leverages the differentiable Waymax simulator as both a next-state predictor and a critic to enable gradient-based search over imagined action sequences. A classifier-guided action selection component enables approximating non-differentiable events such as collisions and offroad occurrences, allowing smooth optimization of actions at test time. Empirically, DSS demonstrates large gains in tracking accuracy (ADE) and path-planning metrics compared with strong baselines and state-of-the-art methods on Waymax/WOMD scenarios, while maintaining a practical compute footprint (~4 seconds per scenario on a RTX3090 for long horizons). These results highlight the practical potential of test-time planning with differentiable dynamics for safer, more human-like autonomous driving behavior, with scope for extension to additional simulators and multi-agent dynamics.

Abstract

Planning allows an agent to safely refine its actions before executing them in the real world. In autonomous driving, this is crucial to avoid collisions and navigate in complex, dense traffic scenarios. One way to plan is to search for the best action sequence. However, this is challenging when all necessary components - policy, next-state predictor, and critic - have to be learned. Here we propose Differentiable Simulation for Search (DSS), a framework that leverages the differentiable simulator Waymax as both a next state predictor and a critic. It relies on the simulator's hardcoded dynamics, making state predictions highly accurate, while utilizing the simulator's differentiability to effectively search across action sequences. Our DSS agent optimizes its actions using gradient descent over imagined future trajectories. We show experimentally that DSS - the combination of planning gradients and stochastic search - significantly improves tracking and path planning accuracy compared to sequence prediction, imitation learning, model-free RL, and other planning methods.

Paper Structure

This paper contains 10 sections, 4 equations, 5 figures, 6 tables, 2 algorithms.

Figures (5)

  • Figure 1: Differentiable simulation at test time. To select the current action, the agent uses a differentiable simulator to imagine a future trajectory ($\mathbf{s}_t, ..., \mathbf{s}_{t+3}$, gray circles), sampled from a distribution of possible states (arcs, white circles). The imagined trajectory is refined using gradient descent towards an optimal one ($\hat{\mathbf{s}}_{t+1}, ..., \hat{\mathbf{s}}_{t+3}$, green circles).
  • Figure 2: Gradient descent in the ego-agent's imagined future. Without gradients (left), searching involves sampling $K$ trajectories (gray) of length $T$, scoring them, and aggregating their first $M$ actions (bold black line). The trajectory from the resulting actions is shown in orange. With gradients, each rolled out trajectory is first updated towards an optimum (green), as judged by the planning loss. The executed trajectory from the aggregated actions more closely aligns with this optimal trajectory.
  • Figure 3: Experiment ablation tree. Depending on the different configuration of whether to search and use gradients, there are four different experimental settings. Our full framework DSS uses both search across multiple trajectories and gradients to optimize the actions across them.
  • Figure 4: Planning loss design. The planning loss includes collision and offroad events. It does not supervise the last imagined location at time $T$ with the last waypoint at time $L$.
  • Figure 5: Driving by planning. The ego-vehicle is blue. All boxes are shown at their initial positions and the gray lines indicate their future motion (crossing lines do not imply collisions). Red dots are red lights. The ego-agent accurately navigates the intersection by periodically planning out its actions through imagination of the future (shown in purple).