Autonomous Vehicle Path Planning by Searching With Differentiable Simulation
Asen Nachkov, Jan-Nico Zaech, Danda Pani Paudel, Xi Wang, Luc Van Gool
TL;DR
This work addresses planning for autonomous driving when all core modules must be learned by proposing Differentiable Simulation for Search (DSS), which leverages the differentiable Waymax simulator as both a next-state predictor and a critic to enable gradient-based search over imagined action sequences. A classifier-guided action selection component enables approximating non-differentiable events such as collisions and offroad occurrences, allowing smooth optimization of actions at test time. Empirically, DSS demonstrates large gains in tracking accuracy (ADE) and path-planning metrics compared with strong baselines and state-of-the-art methods on Waymax/WOMD scenarios, while maintaining a practical compute footprint (~4 seconds per scenario on a RTX3090 for long horizons). These results highlight the practical potential of test-time planning with differentiable dynamics for safer, more human-like autonomous driving behavior, with scope for extension to additional simulators and multi-agent dynamics.
Abstract
Planning allows an agent to safely refine its actions before executing them in the real world. In autonomous driving, this is crucial to avoid collisions and navigate in complex, dense traffic scenarios. One way to plan is to search for the best action sequence. However, this is challenging when all necessary components - policy, next-state predictor, and critic - have to be learned. Here we propose Differentiable Simulation for Search (DSS), a framework that leverages the differentiable simulator Waymax as both a next state predictor and a critic. It relies on the simulator's hardcoded dynamics, making state predictions highly accurate, while utilizing the simulator's differentiability to effectively search across action sequences. Our DSS agent optimizes its actions using gradient descent over imagined future trajectories. We show experimentally that DSS - the combination of planning gradients and stochastic search - significantly improves tracking and path planning accuracy compared to sequence prediction, imitation learning, model-free RL, and other planning methods.
