Table of Contents
Fetching ...

Transfer Learning Study of Motion Transformer-based Trajectory Predictions

Lars Ullrich, Alex McMaster, Knut Graichen

TL;DR

The paper investigates transfer learning for Motion Transformer (MTR) based trajectory prediction by transferring knowledge from the Waymo Open Motion Dataset (WOMD) to a CarMaker-generated target dataset (CMD). It compares three transfer paradigms—Multi-task Learning, Feature Reuse, and Fine-Tuning (including encoder-only and decoder-only variants)—and evaluates both predictive performance and training-time implications. Results show that fine-tuning, particularly encoder fine-tuning (FTE), yields the best target-domain performance with substantial reductions in training time, while multi-task learning provides limited gains and can suffer from forgetting. The study highlights encoder-focused adaptation as a practical path for real-world deployment and emphasizes the need for larger, diverse datasets to further validate transferability across environments and traffic regulations.

Abstract

Trajectory planning in autonomous driving is highly dependent on predicting the emergent behavior of other road users. Learning-based methods are currently showing impressive results in simulation-based challenges, with transformer-based architectures technologically leading the way. Ultimately, however, predictions are needed in the real world. In addition to the shifts from simulation to the real world, many vehicle- and country-specific shifts, i.e. differences in sensor systems, fusion and perception algorithms as well as traffic rules and laws, are on the agenda. Since models that can cover all system setups and design domains at once are not yet foreseeable, model adaptation plays a central role. Therefore, a simulation-based study on transfer learning techniques is conducted on basis of a transformer-based model. Furthermore, the study aims to provide insights into possible trade-offs between computational time and performance to support effective transfers into the real world.

Transfer Learning Study of Motion Transformer-based Trajectory Predictions

TL;DR

The paper investigates transfer learning for Motion Transformer (MTR) based trajectory prediction by transferring knowledge from the Waymo Open Motion Dataset (WOMD) to a CarMaker-generated target dataset (CMD). It compares three transfer paradigms—Multi-task Learning, Feature Reuse, and Fine-Tuning (including encoder-only and decoder-only variants)—and evaluates both predictive performance and training-time implications. Results show that fine-tuning, particularly encoder fine-tuning (FTE), yields the best target-domain performance with substantial reductions in training time, while multi-task learning provides limited gains and can suffer from forgetting. The study highlights encoder-focused adaptation as a practical path for real-world deployment and emphasizes the need for larger, diverse datasets to further validate transferability across environments and traffic regulations.

Abstract

Trajectory planning in autonomous driving is highly dependent on predicting the emergent behavior of other road users. Learning-based methods are currently showing impressive results in simulation-based challenges, with transformer-based architectures technologically leading the way. Ultimately, however, predictions are needed in the real world. In addition to the shifts from simulation to the real world, many vehicle- and country-specific shifts, i.e. differences in sensor systems, fusion and perception algorithms as well as traffic rules and laws, are on the agenda. Since models that can cover all system setups and design domains at once are not yet foreseeable, model adaptation plays a central role. Therefore, a simulation-based study on transfer learning techniques is conducted on basis of a transformer-based model. Furthermore, the study aims to provide insights into possible trade-offs between computational time and performance to support effective transfers into the real world.
Paper Structure (16 sections, 13 equations, 6 figures, 4 tables)

This paper contains 16 sections, 13 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Simplified representation of the MTR architecture.
  • Figure 2: Learning rate schedule across training epochs.
  • Figure 3: Visual comparison between the source baseline model and the fine-tuned model on a CMD target dataset scenario. Here, a vehicle is proceeding forward. The focal vehicle and its ground truth trajectory are green, while its 6 predicted trajectories are red, with higher opacity indicated higher confidence. Neighboring vehicles and their ground truth trajectories are blue, while their predicted trajectories are purple.
  • Figure 4: Visual demonstration of catastrophic forgetting of the fine-tuned model on an intersection scenario of the WOMD source dataset. Here, the vehicle begins to execute a U-turn. The focal vehicle and its ground truth trajectory are green, while its 6 predicted trajectories are red, with higher opacity indicates higher confidence. Neighboring vehicles and their ground truth trajectories are blue, while their predicted trajectories are purple.
  • Figure 5: Total training duration for each model.
  • ...and 1 more figures