QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving
Sourav Biswas, Sergio Casas, Quinlan Sykora, Ben Agro, Abbas Sadat, Raquel Urtasun
TL;DR
QuAD reframes autonomous driving planning by replacing a perception–prediction–planning cascade with a query-based occupancy framework. It samples a small set of candidate trajectories, constructs a BEV latent scene representation, and queries a continuous occupancy model at strategically chosen spatio-temporal points to rank trajectories via a weighted, interpretable cost. The learning process uses dense occupancy supervision and imitation learning with dataset aggregation to achieve robust, safe closed-loop performance, outperforming state-of-the-art baselines in safety, compliance, and imitation while maintaining favorable runtime. The approach demonstrates that aggressive query-point quantization and continuous occupancy querying can deliver practical, interpretable motion planning suitable for real-time autonomous driving in complex highway scenarios.
Abstract
A self-driving vehicle must understand its environment to determine the appropriate action. Traditional autonomy systems rely on object detection to find the agents in the scene. However, object detection assumes a discrete set of objects and loses information about uncertainty, so any errors compound when predicting the future behavior of those agents. Alternatively, dense occupancy grid maps have been utilized to understand free-space. However, predicting a grid for the entire scene is wasteful since only certain spatio-temporal regions are reachable and relevant to the self-driving vehicle. We present a unified, interpretable, and efficient autonomy framework that moves away from cascading modules that first perceive, then predict, and finally plan. Instead, we shift the paradigm to have the planner query occupancy at relevant spatio-temporal points, restricting the computation to those regions of interest. Exploiting this representation, we evaluate candidate trajectories around key factors such as collision avoidance, comfort, and progress for safety and interpretability. Our approach achieves better highway driving quality than the state-of-the-art in high-fidelity closed-loop simulations.
