Open-World Motion Forecasting

Nicolas Schischka; Nikhil Gosala; B Ravi Kiran; Senthil Yogamani; Abhinav Valada

Open-World Motion Forecasting

Nicolas Schischka, Nikhil Gosala, B Ravi Kiran, Senthil Yogamani, Abhinav Valada

TL;DR

This work proposes the first end-to-end class-incremental motion forecasting framework to mitigate catastrophic forgetting while simultaneously learning to forecast newly introduced classes, and demonstrates that the approach successfully resists catastrophic forgetting and maintains performance on previously learned classes while improving adaptation to novel ones.

Abstract

Motion forecasting aims to predict the future trajectories of dynamic agents in the scene, enabling autonomous vehicles to effectively reason about scene evolution. Existing approaches operate under the closed-world regime and assume fixed object taxonomy as well as access to high-quality perception. Therefore, they struggle in real-world settings where perception is imperfect and object taxonomy evolves over time. In this work, we bridge this fundamental gap by introducing open-world motion forecasting, a novel setting in which new object classes are sequentially introduced over time and future object trajectories are estimated directly from camera images. We tackle this setting by proposing the first end-to-end class-incremental motion forecasting framework to mitigate catastrophic forgetting while simultaneously learning to forecast newly introduced classes. When a new class is introduced, our framework employs a pseudo-labeling strategy to first generate motion forecasting pseudo-labels for all known classes which are then processed by a vision-language model to filter inconsistent and over-confident predictions. Parallelly, our approach further mitigates catastrophic forgetting by using a novel replay sampling strategy that leverages query feature variance to sample previous sequences with informative motion patterns. Extensive evaluation on the nuScenes and Argoverse 2 datasets demonstrates that our approach successfully resists catastrophic forgetting and maintains performance on previously learned classes while improving adaptation to novel ones. Further, we demonstrate that our approach supports zero-shot transfer to real-world driving and naturally extends to end-to-end class-incremental planning, enabling continual adaptation of the full autonomous driving system. We provide the code at https://omen.cs.uni-freiburg.de .

Open-World Motion Forecasting

TL;DR

Abstract

Paper Structure (28 sections, 6 equations, 10 figures, 9 tables)

This paper contains 28 sections, 6 equations, 10 figures, 9 tables.

Introduction
Related Work
Technical Approach
Open-World Motion Prediction
VLM-Guided Pseudo-Label Generation
Pseudo-Labeling for Motion Prediction
VLM-Based Filtering of False Positives
Sequence-Based Experience Replay
Extension to Class-Incremental Open-Loop Planning
Experimental Results
Datasets
Baselines
Training Protocol
Quantitative Results
Ablation Study
...and 13 more sections

Figures (10)

Figure 1: Our approach is the first to tackle the problem of open-world motion forecasting. In contrast to (a) traditional and (b) end-to-end motion forecasting, (c) the underlying model is trained incrementally, with access to labels only for a subset of all classes $C^n$ and raw multi-view camera images. As a result, it continually learns to forecast the motion of all classes in an end-to-end manner, while handling imperfect object detections and successfully combating catastrophic forgetting.
Figure 2: Illustration of the proposed OMEN architecture. At each incremental step $i$, we create detection and motion forecasting pseudo-labels for the old categories with the old model $\Phi^{i-1}$, filter them via a matching with the predictions of a VLM, and add them to the detection ($\triangle$) and motion forecasting ($\circ$) ground truth of $D^i$, as detailed in \ref{['subsec:pseudo-labels']}. Furthermore, a replay buffer is created as described in \ref{['subsec:replay']}, based on the latent space of the old model.
Figure 3: Qualitative results of OMEN in comparison to CL-DETR and the upper bound. We show the camera input images with our model's 3D object detections on the left, and the top-3 predicted modes per object for all methods in the bird's-eye-view perspective on the right.
Figure 4: Zero-shot qualitative results of OMEN deployed on real-world data from our self-driving car. OMEN successfully keeps forecasting knowledge about the first two introduced classes car and pedestrian.
Figure S.1: Visualization of the VLM-based mask filtering for objects of the car class.
...and 5 more figures

Open-World Motion Forecasting

TL;DR

Abstract

Open-World Motion Forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (10)