Table of Contents
Fetching ...

RetroMotion: Retrocausal Motion Forecasting Models are Instructable

Royden Wagner, Omer Sahin Tas, Felix Hauser, Marlon Steiner, Dominik Strutz, Abhishek Vivekanandan, Carlos Fernandez, Christoph Stiller

TL;DR

RetroMotion tackles multimodal motion forecasting in interactive road scenes by introducing a retrocausal flow that connects marginal and joint trajectory forecasts within a transformer-based framework. It forecasts both per-agent marginals and joint trajectories via a two-stage decoding process, using retrocausal attention and compressed exponential power distributions to represent positional uncertainty. The approach achieves state-of-the-art performance on the Waymo Interaction Prediction dataset and generalizes to Argoverse 2, while also enabling instruction-following through trajectory modifications for goal-based and directional guidance. This has practical implications for safer, more controllable motion planning in autonomous systems and for simulating user-guided scenarios.

Abstract

Motion forecasts of road users (i.e., agents) vary in complexity as a function of scene constraints and interactive behavior. We address this with a multi-task learning method for motion forecasting that includes a retrocausal flow of information. The corresponding tasks are to forecast (1) marginal trajectory distributions for all modeled agents and (2) joint trajectory distributions for interacting agents. Using a transformer model, we generate the joint distributions by re-encoding marginal distributions followed by pairwise modeling. This incorporates a retrocausal flow of information from later points in marginal trajectories to earlier points in joint trajectories. Per trajectory point, we model positional uncertainty using compressed exponential power distributions. Notably, our method achieves state-of-the-art results in the Waymo Interaction Prediction dataset and generalizes well to the Argoverse 2 dataset. Additionally, our method provides an interface for issuing instructions through trajectory modifications. Our experiments show that regular training of motion forecasting leads to the ability to follow goal-based instructions and to adapt basic directional instructions to the scene context. Code: https://github.com/kit-mrt/future-motion

RetroMotion: Retrocausal Motion Forecasting Models are Instructable

TL;DR

RetroMotion tackles multimodal motion forecasting in interactive road scenes by introducing a retrocausal flow that connects marginal and joint trajectory forecasts within a transformer-based framework. It forecasts both per-agent marginals and joint trajectories via a two-stage decoding process, using retrocausal attention and compressed exponential power distributions to represent positional uncertainty. The approach achieves state-of-the-art performance on the Waymo Interaction Prediction dataset and generalizes to Argoverse 2, while also enabling instruction-following through trajectory modifications for goal-based and directional guidance. This has practical implications for safer, more controllable motion planning in autonomous systems and for simulating user-guided scenarios.

Abstract

Motion forecasts of road users (i.e., agents) vary in complexity as a function of scene constraints and interactive behavior. We address this with a multi-task learning method for motion forecasting that includes a retrocausal flow of information. The corresponding tasks are to forecast (1) marginal trajectory distributions for all modeled agents and (2) joint trajectory distributions for interacting agents. Using a transformer model, we generate the joint distributions by re-encoding marginal distributions followed by pairwise modeling. This incorporates a retrocausal flow of information from later points in marginal trajectories to earlier points in joint trajectories. Per trajectory point, we model positional uncertainty using compressed exponential power distributions. Notably, our method achieves state-of-the-art results in the Waymo Interaction Prediction dataset and generalizes well to the Argoverse 2 dataset. Additionally, our method provides an interface for issuing instructions through trajectory modifications. Our experiments show that regular training of motion forecasting leads to the ability to follow goal-based instructions and to adapt basic directional instructions to the scene context. Code: https://github.com/kit-mrt/future-motion

Paper Structure

This paper contains 25 sections, 4 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: From marginal to joint trajectories. We use an MLP to generate query matrices $\bm{Q}$ from marginal trajectories and exchange information between queries and scene context with attention mechanisms. Afterwards, we decode joint trajectories $\mathcal{P}^{\text{joint}}_{1:T}$ from pairs of queries at the same index. This compresses information from all $K^2$ possible combinations into $K$ query pairs.
  • Figure 2: Joint and combined motion forecasts of our model. Dynamic agents are shown in blue, static agents in grey (determined at $t=0\,\text{s}$). Lanes are black lines and road markings are white lines. Top left: forecasts for two cars, top right: a car yielding to a pedestrian on a crosswalk, bottom left: forecasts for a car and a cyclist, and bottom right: combined forecasts of two cars and six pedestrians.
  • Figure 3: RetroMotion models are instructable through trajectory modifications.
  • Figure 4: Mixture weight of normal components in exponential power distributions. During training the weight $w$ progressively increases, while reaching higher values for joint trajectory distributions than for marginal distributions.
  • Figure 5: Feature vectors collapse (NRC1) metrics. The left plot shows that feature vectors collapses to a subspace spanned by less than 272 principal components. NRC1 does not decrease in the right plot indicating that the feature vectors span more than 32 dimensions.
  • ...and 1 more figures