RetroMotion: Retrocausal Motion Forecasting Models are Instructable
Royden Wagner, Omer Sahin Tas, Felix Hauser, Marlon Steiner, Dominik Strutz, Abhishek Vivekanandan, Carlos Fernandez, Christoph Stiller
TL;DR
RetroMotion tackles multimodal motion forecasting in interactive road scenes by introducing a retrocausal flow that connects marginal and joint trajectory forecasts within a transformer-based framework. It forecasts both per-agent marginals and joint trajectories via a two-stage decoding process, using retrocausal attention and compressed exponential power distributions to represent positional uncertainty. The approach achieves state-of-the-art performance on the Waymo Interaction Prediction dataset and generalizes to Argoverse 2, while also enabling instruction-following through trajectory modifications for goal-based and directional guidance. This has practical implications for safer, more controllable motion planning in autonomous systems and for simulating user-guided scenarios.
Abstract
Motion forecasts of road users (i.e., agents) vary in complexity as a function of scene constraints and interactive behavior. We address this with a multi-task learning method for motion forecasting that includes a retrocausal flow of information. The corresponding tasks are to forecast (1) marginal trajectory distributions for all modeled agents and (2) joint trajectory distributions for interacting agents. Using a transformer model, we generate the joint distributions by re-encoding marginal distributions followed by pairwise modeling. This incorporates a retrocausal flow of information from later points in marginal trajectories to earlier points in joint trajectories. Per trajectory point, we model positional uncertainty using compressed exponential power distributions. Notably, our method achieves state-of-the-art results in the Waymo Interaction Prediction dataset and generalizes well to the Argoverse 2 dataset. Additionally, our method provides an interface for issuing instructions through trajectory modifications. Our experiments show that regular training of motion forecasting leads to the ability to follow goal-based instructions and to adapt basic directional instructions to the scene context. Code: https://github.com/kit-mrt/future-motion
