Table of Contents
Fetching ...

Incremental Object Detection with Prompt-based Methods

Matthias Neuwirth-Trapp, Maarten Bieshaar, Danda Pani Paudel, Luc Van Gool

TL;DR

The paper investigates applying visual prompt tuning to domain incremental object detection (IOD) and benchmarks three prompt-based IL methods (L2P, DualPrompt, S-Prompt) on the challenging D-RICO dataset. It reveals that prompt-based methods generally underperform compared to replay-based baselines, with DualPrompt performing best among prompts, especially when the output layer is fixed or when deeper prompting is used. The study highlights that replaying a small fraction of previous data provides a strong, simple baseline and that prompt length and initialization critically influence performance, with smaller initial values and longer prompts benefiting deep prompting. Collectively, the work provides actionable insights and baselines to guide future development of prompt-based IL for object detection and underscores the continuing importance of rehearsal strategies in practical IL scenarios.

Abstract

Visual prompt-based methods have seen growing interest in incremental learning (IL) for image classification. These approaches learn additional embedding vectors while keeping the model frozen, making them efficient to train. However, no prior work has applied such methods to incremental object detection (IOD), leaving their generalizability unclear. In this paper, we analyze three different prompt-based methods under a complex domain-incremental learning setting. We additionally provide a wide range of reference baselines for comparison. Empirically, we show that the prompt-based approaches we tested underperform in this setting. However, a strong yet practical method, combining visual prompts with replaying a small portion of previous data, achieves the best results. Together with additional experiments on prompt length and initialization, our findings offer valuable insights for advancing prompt-based IL in IOD.

Incremental Object Detection with Prompt-based Methods

TL;DR

The paper investigates applying visual prompt tuning to domain incremental object detection (IOD) and benchmarks three prompt-based IL methods (L2P, DualPrompt, S-Prompt) on the challenging D-RICO dataset. It reveals that prompt-based methods generally underperform compared to replay-based baselines, with DualPrompt performing best among prompts, especially when the output layer is fixed or when deeper prompting is used. The study highlights that replaying a small fraction of previous data provides a strong, simple baseline and that prompt length and initialization critically influence performance, with smaller initial values and longer prompts benefiting deep prompting. Collectively, the work provides actionable insights and baselines to guide future development of prompt-based IL for object detection and underscores the continuing importance of rehearsal strategies in practical IL scenarios.

Abstract

Visual prompt-based methods have seen growing interest in incremental learning (IL) for image classification. These approaches learn additional embedding vectors while keeping the model frozen, making them efficient to train. However, no prior work has applied such methods to incremental object detection (IOD), leaving their generalizability unclear. In this paper, we analyze three different prompt-based methods under a complex domain-incremental learning setting. We additionally provide a wide range of reference baselines for comparison. Empirically, we show that the prompt-based approaches we tested underperform in this setting. However, a strong yet practical method, combining visual prompts with replaying a small portion of previous data, achieves the best results. Together with additional experiments on prompt length and initialization, our findings offer valuable insights for advancing prompt-based IL in IOD.

Paper Structure

This paper contains 18 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Incremental learning results on D-RICO benchmark. The left figure shows overall performance $\overline{\bm{\mathrm{mAP}}}$ versus the forgetting (FM) and the right shows plasticity (FWT) versus FM. The three prompt-based IL methods are far from the optimal of high plasticity and low forgetting (upper left corner).
  • Figure 2: Count for how often a prompt length led to the best mAP in each of the three prompting categories. Plot a) shows result for shallow prompting where the prompt is removed after the first layer, b) shallow with keeping the prompt and c) deep prompt. The results demonstrate that larger prompt length work well for deep prompting, and shallow prompting requires a bit less.
  • Figure 3: Results of different prompt initialization intervals $[-\mathrm{init}, \mathrm{init}]$ averaged over various prompt lengths and injection layers for task 4 from D-RICO. For lower values, the results stabilize and are better than for larger intervals.