Table of Contents
Fetching ...

MAP-Former: Multi-Agent-Pair Gaussian Joint Prediction

Marlon Steiner, Marvin Klemp, Christoph Stiller

TL;DR

The paper identifies a critical gap in trajectory risk assessment: current predictors provide either single-agent forecasts or joint predictions without explicit dependency modeling between interacting agents. It introduces MAP-Former, a multi-module architecture that predicts both per-agent futures and agent-pair covariance matrices to construct scene-wide Gaussian joint PDFs, using a Cholesky-based parameterization to guarantee symmetric, positive-definite covariances. The approach combines a Temporal Encoder, a Spatial/Interaction Encoder (GNN or Transformer), and a Factorized Transformer Decoder, followed by a Multihead Agent-Pair Prediction head and a multivariate Gaussian NLL loss. Evaluated on the rounD dataset, MAP-Former full demonstrates strong joint-prediction performance and provides a principled foundation for risk analysis based on inter-agent correlations.

Abstract

There is a gap in risk assessment of trajectories between the trajectory information coming from a traffic motion prediction module and what is actually needed. Closing this gap necessitates advancements in prediction beyond current practices. Existing prediction models yield joint predictions of agents' future trajectories with uncertainty weights or marginal Gaussian probability density functions (PDFs) for single agents. Although, these methods achieve high accurate trajectory predictions, they only provide little or no information about the dependencies of interacting agents. Since traffic is a process of highly interdependent agents, whose actions directly influence their mutual behavior, the existing methods are not sufficient to reliably assess the risk of future trajectories. This paper addresses that gap by introducing a novel approach to motion prediction, focusing on predicting agent-pair covariance matrices in a ``scene-centric'' manner, which can then be used to model Gaussian joint PDFs for all agent-pairs in a scene. We propose a model capable of predicting those agent-pair covariance matrices, leveraging an enhanced awareness of interactions. Utilizing the prediction results of our model, this work forms the foundation for comprehensive risk assessment with statistically based methods for analyzing agents' relations by their joint PDFs.

MAP-Former: Multi-Agent-Pair Gaussian Joint Prediction

TL;DR

The paper identifies a critical gap in trajectory risk assessment: current predictors provide either single-agent forecasts or joint predictions without explicit dependency modeling between interacting agents. It introduces MAP-Former, a multi-module architecture that predicts both per-agent futures and agent-pair covariance matrices to construct scene-wide Gaussian joint PDFs, using a Cholesky-based parameterization to guarantee symmetric, positive-definite covariances. The approach combines a Temporal Encoder, a Spatial/Interaction Encoder (GNN or Transformer), and a Factorized Transformer Decoder, followed by a Multihead Agent-Pair Prediction head and a multivariate Gaussian NLL loss. Evaluated on the rounD dataset, MAP-Former full demonstrates strong joint-prediction performance and provides a principled foundation for risk analysis based on inter-agent correlations.

Abstract

There is a gap in risk assessment of trajectories between the trajectory information coming from a traffic motion prediction module and what is actually needed. Closing this gap necessitates advancements in prediction beyond current practices. Existing prediction models yield joint predictions of agents' future trajectories with uncertainty weights or marginal Gaussian probability density functions (PDFs) for single agents. Although, these methods achieve high accurate trajectory predictions, they only provide little or no information about the dependencies of interacting agents. Since traffic is a process of highly interdependent agents, whose actions directly influence their mutual behavior, the existing methods are not sufficient to reliably assess the risk of future trajectories. This paper addresses that gap by introducing a novel approach to motion prediction, focusing on predicting agent-pair covariance matrices in a ``scene-centric'' manner, which can then be used to model Gaussian joint PDFs for all agent-pairs in a scene. We propose a model capable of predicting those agent-pair covariance matrices, leveraging an enhanced awareness of interactions. Utilizing the prediction results of our model, this work forms the foundation for comprehensive risk assessment with statistically based methods for analyzing agents' relations by their joint PDFs.
Paper Structure (10 sections, 5 equations, 4 figures, 1 table)

This paper contains 10 sections, 5 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Representation of the prediction model's output: One Gaussian joint PDF for every time step and every mode (\ref{['fig:gaussianPrediction_a']}, \ref{['fig:gaussianPrediction_b']}) of an agent-pair based on the predicted covariance matrices. The different shadings of the ellipses (joint PDFs) represent the consecutive time steps. Due to visualization reasons, 2d ellipses are used instead of 4d Gaussian PDFs as the model actually predicts. Combining the modes with uncertainty weights results in a Gaussian mixture PDF.
  • Figure 2: Different tasks in motion prediction. Here the first two columns represent the past and the last three the future.
  • Figure 3: Network architecture of our motion prediction model. We use a TEnc (top left) and the different models can switch between either a GNN-based SaIEnc (middle left), a Transformer-based SaIEnc (bottom left) or no SaIEnc. The red points in the encoders represent the agents. The colors blue, orange and yellow associated with the tokens, embeddings and trajectories represent the corresponding agents.
  • Figure 4: Visual prediction results: The figure shows a scene from rounD *krajewski2020rounD with twelve agents (red points). For every agent the figure provides its past trajectory (gray), its ground truth (black) and its predicted trajectory for $t=3\,\mathrm{s}$ (colored). The lines, connecting the agent-pairs, represent the upper diagonal blocks of the predicted covariance matrices and therefore describe the dependencies between agent-pairs.