PRECOG: PREdiction Conditioned On Goals in Visual Multi-Agent Settings
Nicholas Rhinehart, Rowan McAllister, Kris Kitani, Sergey Levine
TL;DR
This work addresses predicting uncertain, interactive behaviors of multiple road agents, including how others respond to a controlled agent's goals. It introduces ESP, a flow-based, invertible generative model that yields exact joint likelihoods for multi-agent trajectories and captures agent interactions via factorized latent variables. Building on ESP, PRECOG enables goal-conditioned forecasting by planning in latent space to maximize a posterior-like objective that combines a learned multi-agent prior with a goal likelihood, demonstrated through gradient-based latent planning. Empirical results on CARLA and nuScenes show ESP achieves state-of-the-art forecasting performance, and PRECOG improves both ego and surrounding agents' trajectory predictions when conditioned on the ego agent's goals, highlighting the practical impact for autonomous driving and safe multi-agent coordination.
Abstract
For autonomous vehicles (AVs) to behave appropriately on roads populated by human-driven vehicles, they must be able to reason about the uncertain intentions and decisions of other drivers from rich perceptual information. Towards these capabilities, we present a probabilistic forecasting model of future interactions between a variable number of agents. We perform both standard forecasting and the novel task of conditional forecasting, which reasons about how all agents will likely respond to the goal of a controlled agent (here, the AV). We train models on real and simulated data to forecast vehicle trajectories given past positions and LIDAR. Our evaluation shows that our model is substantially more accurate in multi-agent driving scenarios compared to existing state-of-the-art. Beyond its general ability to perform conditional forecasting queries, we show that our model's predictions of all agents improve when conditioned on knowledge of the AV's goal, further illustrating its capability to model agent interactions.
