Table of Contents
Fetching ...

PRECOG: PREdiction Conditioned On Goals in Visual Multi-Agent Settings

Nicholas Rhinehart, Rowan McAllister, Kris Kitani, Sergey Levine

TL;DR

This work addresses predicting uncertain, interactive behaviors of multiple road agents, including how others respond to a controlled agent's goals. It introduces ESP, a flow-based, invertible generative model that yields exact joint likelihoods for multi-agent trajectories and captures agent interactions via factorized latent variables. Building on ESP, PRECOG enables goal-conditioned forecasting by planning in latent space to maximize a posterior-like objective that combines a learned multi-agent prior with a goal likelihood, demonstrated through gradient-based latent planning. Empirical results on CARLA and nuScenes show ESP achieves state-of-the-art forecasting performance, and PRECOG improves both ego and surrounding agents' trajectory predictions when conditioned on the ego agent's goals, highlighting the practical impact for autonomous driving and safe multi-agent coordination.

Abstract

For autonomous vehicles (AVs) to behave appropriately on roads populated by human-driven vehicles, they must be able to reason about the uncertain intentions and decisions of other drivers from rich perceptual information. Towards these capabilities, we present a probabilistic forecasting model of future interactions between a variable number of agents. We perform both standard forecasting and the novel task of conditional forecasting, which reasons about how all agents will likely respond to the goal of a controlled agent (here, the AV). We train models on real and simulated data to forecast vehicle trajectories given past positions and LIDAR. Our evaluation shows that our model is substantially more accurate in multi-agent driving scenarios compared to existing state-of-the-art. Beyond its general ability to perform conditional forecasting queries, we show that our model's predictions of all agents improve when conditioned on knowledge of the AV's goal, further illustrating its capability to model agent interactions.

PRECOG: PREdiction Conditioned On Goals in Visual Multi-Agent Settings

TL;DR

This work addresses predicting uncertain, interactive behaviors of multiple road agents, including how others respond to a controlled agent's goals. It introduces ESP, a flow-based, invertible generative model that yields exact joint likelihoods for multi-agent trajectories and captures agent interactions via factorized latent variables. Building on ESP, PRECOG enables goal-conditioned forecasting by planning in latent space to maximize a posterior-like objective that combines a learned multi-agent prior with a goal likelihood, demonstrated through gradient-based latent planning. Empirical results on CARLA and nuScenes show ESP achieves state-of-the-art forecasting performance, and PRECOG improves both ego and surrounding agents' trajectory predictions when conditioned on the ego agent's goals, highlighting the practical impact for autonomous driving and safe multi-agent coordination.

Abstract

For autonomous vehicles (AVs) to behave appropriately on roads populated by human-driven vehicles, they must be able to reason about the uncertain intentions and decisions of other drivers from rich perceptual information. Towards these capabilities, we present a probabilistic forecasting model of future interactions between a variable number of agents. We perform both standard forecasting and the novel task of conditional forecasting, which reasons about how all agents will likely respond to the goal of a controlled agent (here, the AV). We train models on real and simulated data to forecast vehicle trajectories given past positions and LIDAR. Our evaluation shows that our model is substantially more accurate in multi-agent driving scenarios compared to existing state-of-the-art. Beyond its general ability to perform conditional forecasting queries, we show that our model's predictions of all agents improve when conditioned on knowledge of the AV's goal, further illustrating its capability to model agent interactions.

Paper Structure

This paper contains 24 sections, 17 equations, 25 figures, 5 tables, 3 algorithms.

Figures (25)

  • Figure 1: Forecasting on nuScenes nuscenes. The input to our model is a high-dimensional LIDAR observation, which informs a distribution over all agents' future trajectories.
  • Figure 2: Conditioning the model on different Car 1 goals produces different predictions: here it forecasts Car 3 to move if Car 1 yields space, or stay stopped if Car 1 stays stopped.
  • Figure 3: Our factorized latent variable model of forecasting and planning shown for 2 agents. In Fig. \ref{['fig:forecast-coinfluence']} our model uses latent variable $\mathbf{Z}_{t+1}^a$ to represent variation in agent $a$'s plausible scene-conditioned reactions to all agents $\mathbf{S}_t$, causing uncertainty in every agents' future states $\mathbf{S}$. Variation exists because of unknown driver goals and different driving styles observed in the training data. Beyond forecasting, our model admits planning robot decisions by deciding$\mathbf{Z}^r\!=\!\mathbf{z}^r$ (Fig. \ref{['fig:plan-coinfluence']}). Shaded nodes represent observed or determined variables, and square nodes represent robot decisions barber2012bayesian. Thick arrows represent grouped dependencies of non-Makovian$\mathbf{S}_t$ "carried forward" (a regular edge exists between any pair of nodes linked by a chain of thick edges). Note $\mathbf{Z}$factorizes across agents, isolating the robot's reaction variable $\mathbf{z}^r$. Human reactions remain uncertain ($\mathbf{Z}^h$ is unobserved) and uncontrollable (the robot cannot decide $\mathbf{Z}^h$), and yet the robot's decisions $\mathbf{z}^r$ will still influence human drivers $\mathbf{S}^h_{2:T}$ (and vice-versa). Fig. \ref{['fig:arch']} shows our implementation. See Appendix \ref{['app:implementation']} for details.
  • Figure 4: Didactic evaluation. Left plots: R2P2-MA cannot model agent interaction, and generates joint behaviors not present in the data. Right plots: ESP allows agents to influence each other, and does not generate undesirable joint behaviors.
  • Figure 5: Examples of multi-agent forecasting with our learned ESP model. In each scene, 12 joint samples are shown, and LIDAR colors are discretized to near-ground and above-ground. Left: (CARLA) the model predicts Car 1 could either turn left or right, while the other agents' future maintain multimodality in their speeds. Center-left: The model predicts Car 2 will likely wait (it is blocked by Cars 3 and 5), and that Cars 3 and 5 sometimes move forward together, and sometimes stay stationary. Center-right: Car 2 is predicted to overtake Car 1, which itself is forecasted to continue to wait for pedestrians and Car 2. Right: Car 4 is predicted to wait for the other cars to clear the intersection, and Car 5 is predicted to either start turning or continue straight.
  • ...and 20 more figures