Table of Contents
Fetching ...

FIMP: Future Interaction Modeling for Multi-Agent Motion Prediction

Sungmin Woo, Minjung Kim, Donghyeong Kim, Sungjun Jang, Sangyoun Lee

TL;DR

The paper tackles uncertain multi-agent interactions in motion forecasting by introducing FIMP, a framework that decouples potential future information from history through a dedicated future decoder. FIMP learns future affinities among agents and applies top-$k$ filtering to identify interacting pairs for targeted message passing, enabling end-to-end, multi-modal predictions with reduced reliance on pre-estimated future cues. The method temporalizes mode embeddings into sparse future time zones and outputs a Laplace-distributed prediction, achieving state-of-the-art minFDE and minADE on Argoverse while maintaining real-time inference. This implicit future interaction modeling offers a practical alternative to explicit future-state conditioning, improving realism and robustness in multi-agent forecasting.

Abstract

Multi-agent motion prediction is a crucial concern in autonomous driving, yet it remains a challenge owing to the ambiguous intentions of dynamic agents and their intricate interactions. Existing studies have attempted to capture interactions between road entities by using the definite data in history timesteps, as future information is not available and involves high uncertainty. However, without sufficient guidance for capturing future states of interacting agents, they frequently produce unrealistic trajectory overlaps. In this work, we propose Future Interaction modeling for Motion Prediction (FIMP), which captures potential future interactions in an end-to-end manner. FIMP adopts a future decoder that implicitly extracts the potential future information in an intermediate feature-level, and identifies the interacting entity pairs through future affinity learning and top-k filtering strategy. Experiments show that our future interaction modeling improves the performance remarkably, leading to superior performance on the Argoverse motion forecasting benchmark.

FIMP: Future Interaction Modeling for Multi-Agent Motion Prediction

TL;DR

The paper tackles uncertain multi-agent interactions in motion forecasting by introducing FIMP, a framework that decouples potential future information from history through a dedicated future decoder. FIMP learns future affinities among agents and applies top- filtering to identify interacting pairs for targeted message passing, enabling end-to-end, multi-modal predictions with reduced reliance on pre-estimated future cues. The method temporalizes mode embeddings into sparse future time zones and outputs a Laplace-distributed prediction, achieving state-of-the-art minFDE and minADE on Argoverse while maintaining real-time inference. This implicit future interaction modeling offers a practical alternative to explicit future-state conditioning, improving realism and robustness in multi-agent forecasting.

Abstract

Multi-agent motion prediction is a crucial concern in autonomous driving, yet it remains a challenge owing to the ambiguous intentions of dynamic agents and their intricate interactions. Existing studies have attempted to capture interactions between road entities by using the definite data in history timesteps, as future information is not available and involves high uncertainty. However, without sufficient guidance for capturing future states of interacting agents, they frequently produce unrealistic trajectory overlaps. In this work, we propose Future Interaction modeling for Motion Prediction (FIMP), which captures potential future interactions in an end-to-end manner. FIMP adopts a future decoder that implicitly extracts the potential future information in an intermediate feature-level, and identifies the interacting entity pairs through future affinity learning and top-k filtering strategy. Experiments show that our future interaction modeling improves the performance remarkably, leading to superior performance on the Argoverse motion forecasting benchmark.
Paper Structure (16 sections, 15 equations, 10 figures, 5 tables)

This paper contains 16 sections, 15 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Overview of interaction modeling in motion prediction. (a) Observed-historical-information-based interaction. (b) Estimated high-level future-states-based conditional prediction. (c) Our feature-level potential future information based interaction.
  • Figure 2: Architecture of FIMP framework. Our network consists of two parts each for history and future feature learning. The future decoder separates the future feature space from the history, enabling the interaction modeling in respective time zones.
  • Figure 3: Network structure from future decoder to prediction head. For brevity, the process to predict the motions on a single mode of agent $i$ is illustrated. $\text{GRU}_2$ works similar to $\text{GRU}_1$, but interaction-aware zone-wise future feature $\tilde{F}^{m,z}$ is only repeated for timesteps that it involves.
  • Figure 4: Qualitative comparison of models in the scenarios where future interaction modeling is essential. The trajectories of interacting agents are shown in green and orange while ground-truth trajectories are in red.
  • Figure 5: In this scenario, the orange agent comes to a halt at an intersection due to the presence of a green agent passing by. As FIMP takes into account the future motions of the interacting agent, it predicts that the orange agent will not approach the green agent, whereas other models predict that the orange agent may attempt to proceed through the green agent's future trajectories.
  • ...and 5 more figures