What-If Motion Prediction for Autonomous Driving
Siddhesh Khandelwal, William Qi, Jagjeet Singh, Andrew Hartnett, Deva Ramanan
TL;DR
The paper addresses long-horizon motion forecasting for autonomous driving by introducing WIMP, a recurrent graph-based model that jointly leverages interpretable road-network polylines and social actor interactions. It enables conditional, counterfactual forecasting conditioned on hypothetical polylines and social contexts, producing diverse multi-modal predictions with a likelihood-aware decoding strategy. Empirical results on Argoverse and NuScenes show state-of-the-art or competitive performance, with ablations confirming the value of combining map and social context under an EWTA-driven multi-predictor regime. This approach enhances planner integration by providing controllable, interpretable forecasts and a mechanism to reason about unobserved or unlikely futures relevant to the AV's plan.
Abstract
Forecasting the long-term future motion of road actors is a core challenge to the deployment of safe autonomous vehicles (AVs). Viable solutions must account for both the static geometric context, such as road lanes, and dynamic social interactions arising from multiple actors. While recent deep architectures have achieved state-of-the-art performance on distance-based forecasting metrics, these approaches produce forecasts that are predicted without regard to the AV's intended motion plan. In contrast, we propose a recurrent graph-based attentional approach with interpretable geometric (actor-lane) and social (actor-actor) relationships that supports the injection of counterfactual geometric goals and social contexts. Our model can produce diverse predictions conditioned on hypothetical or "what-if" road lanes and multi-actor interactions. We show that such an approach could be used in the planning loop to reason about unobserved causes or unlikely futures that are directly relevant to the AV's intended route.
