Table of Contents
Fetching ...

SOP^2: Transfer Learning with Scene-Oriented Prompt Pool on 3D Object Detection

Ching-Hung Cheng, Hsiu-Fu Wu, Bing-Chen Wu, Khanh-Phong Bui, Van-Tin Luu, Ching-Chun Huang

TL;DR

This work investigates prompt-based transfer learning for 3D object detection, introducing a Scene-Oriented Prompt Pool (SOP^2) that tailors prompts to scene partitions. It progresses from simple prompt tokens and prompt generators to a dynamic, pool-driven approach that selects per-partition prompts, achieving stronger cross-domain performance with far fewer trainable parameters than full fine-tuning. Extensive KITTI experiments, aided by Waymo pretraining, show that SOP^2 outperforms conventional PEFT methods and benefits from synergy with LoRA. The results underscore the potential of prompts to bridge domain gaps in 3D perception and open avenues for prompt-centric research in 3D vision.

Abstract

With the rise of Large Language Models (LLMs) such as GPT-3, these models exhibit strong generalization capabilities. Through transfer learning techniques such as fine-tuning and prompt tuning, they can be adapted to various downstream tasks with minimal parameter adjustments. This approach is particularly common in the field of Natural Language Processing (NLP). This paper aims to explore the effectiveness of common prompt tuning methods in 3D object detection. We investigate whether a model trained on the large-scale Waymo dataset can serve as a foundation model and adapt to other scenarios within the 3D object detection field. This paper sequentially examines the impact of prompt tokens and prompt generators, and further proposes a Scene-Oriented Prompt Pool (\textbf{SOP$^2$}). We demonstrate the effectiveness of prompt pools in 3D object detection, with the goal of inspiring future researchers to delve deeper into the potential of prompts in the 3D field.

SOP^2: Transfer Learning with Scene-Oriented Prompt Pool on 3D Object Detection

TL;DR

This work investigates prompt-based transfer learning for 3D object detection, introducing a Scene-Oriented Prompt Pool (SOP^2) that tailors prompts to scene partitions. It progresses from simple prompt tokens and prompt generators to a dynamic, pool-driven approach that selects per-partition prompts, achieving stronger cross-domain performance with far fewer trainable parameters than full fine-tuning. Extensive KITTI experiments, aided by Waymo pretraining, show that SOP^2 outperforms conventional PEFT methods and benefits from synergy with LoRA. The results underscore the potential of prompts to bridge domain gaps in 3D perception and open avenues for prompt-centric research in 3D vision.

Abstract

With the rise of Large Language Models (LLMs) such as GPT-3, these models exhibit strong generalization capabilities. Through transfer learning techniques such as fine-tuning and prompt tuning, they can be adapted to various downstream tasks with minimal parameter adjustments. This approach is particularly common in the field of Natural Language Processing (NLP). This paper aims to explore the effectiveness of common prompt tuning methods in 3D object detection. We investigate whether a model trained on the large-scale Waymo dataset can serve as a foundation model and adapt to other scenarios within the 3D object detection field. This paper sequentially examines the impact of prompt tokens and prompt generators, and further proposes a Scene-Oriented Prompt Pool (\textbf{SOP}). We demonstrate the effectiveness of prompt pools in 3D object detection, with the goal of inspiring future researchers to delve deeper into the potential of prompts in the 3D field.

Paper Structure

This paper contains 16 sections, 8 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Comparison of various transfer learning methods: (a) Unsupervised Domain Adaptation, (b) Low-Rank Adaptation, (c) Prompt Tuning and (d) Prompt Pool. Green text represents the source domain, while red text represents the target domain.
  • Figure 2: The architecture of (a) PT block: Adding prompt token to $\textbf{S}_j$ and (b) PG block: Adding prompt generator to $\textbf{S}_j$.
  • Figure 3: The overall architecture of our proposed SOP$^2$. The upper part shows the overall pipeline, while the lower part illustrates a single SOP$^2$ block. Each set partition is assigned a corresponding prompt pool, allowing the set to select suitable prompt tokens from the prompt pool.
  • Figure 4: For the visual representation of t-SNE. (a) Distribution of different set partitions $\textbf{S}_j$, where different colors represent different partitions, and (b) Distribution of different prompt pools $PP_j$ corresponding to set partitions $\textbf{S}_j$, where different colors represent different prompt pools.
  • Figure 5: Comparison of 3D mAP with 40 recall positions for Prompt Length $n_P$ and Select top-K when the prompt pool size M = 40.
  • ...and 1 more figures