SceneMotion: From Agent-Centric Embeddings to Scene-Wide Forecasts
Royden Wagner, Ömer Sahin Tas, Marlon Steiner, Fabian Konstantinidis, Hendrik Königshof, Marvin Klemp, Carlos Fernandez, Christoph Stiller
TL;DR
SceneMotion tackles scene-wide forecasting of joint trajectories for multiple traffic agents in driving environments by transforming local agent-centric embeddings into a global scene-wide latent space and decoding this into six joint motion modes. The method leverages an attention-based latent context module and an anchor-based decoder to capture interactions among up to eight focal agents with dense context, achieving strong performance on the Waymo Open Motion and Interaction Prediction benchmarks, while offering a waypoint-clustering analysis to quantify inter-agent interactions. Key contributions include data-efficient agent-centric representations, a global latent context for joint forecasting, and a quantitative interpretability tool that assesses whether predicted interactions resolve potential conflicts. The approach has practical impact for planning in autonomous driving, as it provides both accurate scene-wide forecasts and a mechanism to identify and reason about potential future interactions.
Abstract
Self-driving vehicles rely on multimodal motion forecasts to effectively interact with their environment and plan safe maneuvers. We introduce SceneMotion, an attention-based model for forecasting scene-wide motion modes of multiple traffic agents. Our model transforms local agent-centric embeddings into scene-wide forecasts using a novel latent context module. This module learns a scene-wide latent space from multiple agent-centric embeddings, enabling joint forecasting and interaction modeling. The competitive performance in the Waymo Open Interaction Prediction Challenge demonstrates the effectiveness of our approach. Moreover, we cluster future waypoints in time and space to quantify the interaction between agents. We merge all modes and analyze each mode independently to determine which clusters are resolved through interaction or result in conflict. Our implementation is available at: https://github.com/kit-mrt/future-motion
