Table of Contents
Fetching ...

QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving

Sourav Biswas, Sergio Casas, Quinlan Sykora, Ben Agro, Abbas Sadat, Raquel Urtasun

TL;DR

QuAD reframes autonomous driving planning by replacing a perception–prediction–planning cascade with a query-based occupancy framework. It samples a small set of candidate trajectories, constructs a BEV latent scene representation, and queries a continuous occupancy model at strategically chosen spatio-temporal points to rank trajectories via a weighted, interpretable cost. The learning process uses dense occupancy supervision and imitation learning with dataset aggregation to achieve robust, safe closed-loop performance, outperforming state-of-the-art baselines in safety, compliance, and imitation while maintaining favorable runtime. The approach demonstrates that aggressive query-point quantization and continuous occupancy querying can deliver practical, interpretable motion planning suitable for real-time autonomous driving in complex highway scenarios.

Abstract

A self-driving vehicle must understand its environment to determine the appropriate action. Traditional autonomy systems rely on object detection to find the agents in the scene. However, object detection assumes a discrete set of objects and loses information about uncertainty, so any errors compound when predicting the future behavior of those agents. Alternatively, dense occupancy grid maps have been utilized to understand free-space. However, predicting a grid for the entire scene is wasteful since only certain spatio-temporal regions are reachable and relevant to the self-driving vehicle. We present a unified, interpretable, and efficient autonomy framework that moves away from cascading modules that first perceive, then predict, and finally plan. Instead, we shift the paradigm to have the planner query occupancy at relevant spatio-temporal points, restricting the computation to those regions of interest. Exploiting this representation, we evaluate candidate trajectories around key factors such as collision avoidance, comfort, and progress for safety and interpretability. Our approach achieves better highway driving quality than the state-of-the-art in high-fidelity closed-loop simulations.

QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving

TL;DR

QuAD reframes autonomous driving planning by replacing a perception–prediction–planning cascade with a query-based occupancy framework. It samples a small set of candidate trajectories, constructs a BEV latent scene representation, and queries a continuous occupancy model at strategically chosen spatio-temporal points to rank trajectories via a weighted, interpretable cost. The learning process uses dense occupancy supervision and imitation learning with dataset aggregation to achieve robust, safe closed-loop performance, outperforming state-of-the-art baselines in safety, compliance, and imitation while maintaining favorable runtime. The approach demonstrates that aggressive query-point quantization and continuous occupancy querying can deliver practical, interpretable motion planning suitable for real-time autonomous driving in complex highway scenarios.

Abstract

A self-driving vehicle must understand its environment to determine the appropriate action. Traditional autonomy systems rely on object detection to find the agents in the scene. However, object detection assumes a discrete set of objects and loses information about uncertainty, so any errors compound when predicting the future behavior of those agents. Alternatively, dense occupancy grid maps have been utilized to understand free-space. However, predicting a grid for the entire scene is wasteful since only certain spatio-temporal regions are reachable and relevant to the self-driving vehicle. We present a unified, interpretable, and efficient autonomy framework that moves away from cascading modules that first perceive, then predict, and finally plan. Instead, we shift the paradigm to have the planner query occupancy at relevant spatio-temporal points, restricting the computation to those regions of interest. Exploiting this representation, we evaluate candidate trajectories around key factors such as collision avoidance, comfort, and progress for safety and interpretability. Our approach achieves better highway driving quality than the state-of-the-art in high-fidelity closed-loop simulations.
Paper Structure (58 sections, 11 equations, 10 figures, 10 tables)

This paper contains 58 sections, 11 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: The set of potential plans for the ego vehicle, with time into the future colored from blue to green. Our neural motion planner, QuAD, builds upon two observations: (1) the plans' reachable space is much smaller than the full spatio-temporal volume and (2) many ego states throughout the trajectories are in close proximity to each other.
  • Figure 2: QuAD inference. Given the ego state and the map, the trajectory sampler generates candidate plans. These plans are converted into query points that cover the relevant areas around the ego vehicle future positions. Leveraging multi-sweep LiDAR and HD map, a scene encoder builds a BEV latent representation which we then use to query an implicit occupancy model. Finally, we gather the occupancy relevant to each trajectory, cost them, and select the one with the lowest cost.
  • Figure 3: Driving quality vs. runtime comparison
  • Figure 4: Qualitative results. We visualize the LiDAR point cloud, map, predicted occupancy (by querying $\psi$ at a regular grid, solely for illustration purposes), and cost associated with the trajectory samples. From top to bottom: a lane change, a re-incorporation near an off-ramp, and a merge.
  • Figure 5: Illustration of example costs. These costs are evaluated at multiple time steps $t$ along candidate plans, and aggregated to extract the trajectory cost. For agent-aware costs, the cost is modulated by the occupancy probability at the query points.
  • ...and 5 more figures