Table of Contents
Fetching ...

Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models

Jifeng Wang, Kaouther Messaoud, Yuejiang Liu, Juergen Gall, Alexandre Alahi

TL;DR

Forecast-PEFT addresses the inefficiency of fine-tuning pre-trained motion forecasting models by freezing the encoder and decoder and introducing Contextual Embedding Prompts, Modality-Control Prompts, and Parallel Adapters to enable parameter-efficient adaptation. The approach supports cross-dataset finetuning via a universality-pretraining plus PEFT specialization paradigm, achieving competitive accuracy with only around 20% of trainable parameters compared to full fine-tuning. An extended variant, Forecast-FT, demonstrates further performance gains by fully fine-tuning all parameters. The work provides strong empirical evidence of efficiency gains, robust cross-dataset generalization, and practical applicability for autonomous driving systems with limited computational resources.

Abstract

Recent progress in motion forecasting has been substantially driven by self-supervised pre-training. However, adapting pre-trained models for specific downstream tasks, especially motion prediction, through extensive fine-tuning is often inefficient. This inefficiency arises because motion prediction closely aligns with the masked pre-training tasks, and traditional full fine-tuning methods fail to fully leverage this alignment. To address this, we introduce Forecast-PEFT, a fine-tuning strategy that freezes the majority of the model's parameters, focusing adjustments on newly introduced prompts and adapters. This approach not only preserves the pre-learned representations but also significantly reduces the number of parameters that need retraining, thereby enhancing efficiency. This tailored strategy, supplemented by our method's capability to efficiently adapt to different datasets, enhances model efficiency and ensures robust performance across datasets without the need for extensive retraining. Our experiments show that Forecast-PEFT outperforms traditional full fine-tuning methods in motion prediction tasks, achieving higher accuracy with only 17% of the trainable parameters typically required. Moreover, our comprehensive adaptation, Forecast-FT, further improves prediction performance, evidencing up to a 9.6% enhancement over conventional baseline methods. Code will be available at https://github.com/csjfwang/Forecast-PEFT.

Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models

TL;DR

Forecast-PEFT addresses the inefficiency of fine-tuning pre-trained motion forecasting models by freezing the encoder and decoder and introducing Contextual Embedding Prompts, Modality-Control Prompts, and Parallel Adapters to enable parameter-efficient adaptation. The approach supports cross-dataset finetuning via a universality-pretraining plus PEFT specialization paradigm, achieving competitive accuracy with only around 20% of trainable parameters compared to full fine-tuning. An extended variant, Forecast-FT, demonstrates further performance gains by fully fine-tuning all parameters. The work provides strong empirical evidence of efficiency gains, robust cross-dataset generalization, and practical applicability for autonomous driving systems with limited computational resources.

Abstract

Recent progress in motion forecasting has been substantially driven by self-supervised pre-training. However, adapting pre-trained models for specific downstream tasks, especially motion prediction, through extensive fine-tuning is often inefficient. This inefficiency arises because motion prediction closely aligns with the masked pre-training tasks, and traditional full fine-tuning methods fail to fully leverage this alignment. To address this, we introduce Forecast-PEFT, a fine-tuning strategy that freezes the majority of the model's parameters, focusing adjustments on newly introduced prompts and adapters. This approach not only preserves the pre-learned representations but also significantly reduces the number of parameters that need retraining, thereby enhancing efficiency. This tailored strategy, supplemented by our method's capability to efficiently adapt to different datasets, enhances model efficiency and ensures robust performance across datasets without the need for extensive retraining. Our experiments show that Forecast-PEFT outperforms traditional full fine-tuning methods in motion prediction tasks, achieving higher accuracy with only 17% of the trainable parameters typically required. Moreover, our comprehensive adaptation, Forecast-FT, further improves prediction performance, evidencing up to a 9.6% enhancement over conventional baseline methods. Code will be available at https://github.com/csjfwang/Forecast-PEFT.
Paper Structure (45 sections, 7 equations, 7 figures, 6 tables)

This paper contains 45 sections, 7 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: (a) Comparison of trainable parameters (percentage), performance (minFDE), and catastrophic forgetting (Reconstruction Error). The dashed line indicates the reconstruction error of the pre-trained baseline - Forecast-MAE (PRE). The colored circles represent the full fine-tuned baseline - Forecast-MAE (FT), and our presented PEFT methods, additive modules only -- Forecast-PEFT(A) and default model -- Forecast-PEFT. (b) Traditional pretraining and fine-tuning on different datasets. (c) Our Forecast-PEFT method offers flexible cross-dataset fine-tuning, requiring just a single pre-training on a large dataset followed by efficient fine-tuning across various datasets. Its tunable parameters, acting as a plug-in module, necessitate training only 20% of the parameters for adaptation to each new dataset.
  • Figure 2: Forecast-PEFT vs. traditional full finetuning: (a) Traditional methods, exemplified by Forecast-MAE cheng2023forecast, use a pre-trained encoder and attach a randomly initialized new decoder. (b) Forecast-PEFT retains the pre-trained decoder, initially for masked motion/map reconstruction, adapting it for future motion forecasting by integrating Modality Control Prompts into the decoder, Contextual Embedding Prompts into the encoder, and incorporating Parallel Adapters.
  • Figure 3: Padding the missing time steps with masking. For datasets with shorter histories and future motions (like AV1), and datasets with lower sampling frequency (like nuScenes), padding the missing time frames is effective.
  • Figure 4: Pre-training time comparison. Training time is measured using 4 GPUs.
  • Figure 5: Efficiency of using different percentage of the dataset for fine-tuning. All evaluation results are obtained on AV2 validation set.
  • ...and 2 more figures