Table of Contents
Fetching ...

AdaViPro: Region-based Adaptive Visual Prompt for Large-Scale Models Adapting

Mengyu Yang, Ye Tian, Lanshan Zhang, Xiao Liang, Xuming Ran, Wendong Wang

TL;DR

This work proposes a region-based Adaptive Visual Prompt, named AdaViPro, which integrates the `where to add' optimization of the prompt into the learning process, and reconceptualizes the `where to add' optimization as a problem of regional decision-making.

Abstract

Recently, prompt-based methods have emerged as a new alternative `parameter-efficient fine-tuning' paradigm, which only fine-tunes a small number of additional parameters while keeping the original model frozen. However, despite achieving notable results, existing prompt methods mainly focus on `what to add', while overlooking the equally important aspect of `where to add', typically relying on the manually crafted placement. To this end, we propose a region-based Adaptive Visual Prompt, named AdaViPro, which integrates the `where to add' optimization of the prompt into the learning process. Specifically, we reconceptualize the `where to add' optimization as a problem of regional decision-making. During inference, AdaViPro generates a regionalized mask map for the whole image, which is composed of 0 and 1, to designate whether to apply or discard the prompt in each specific area. Therefore, we employ Gumbel-Softmax sampling to enable AdaViPro's end-to-end learning through standard back-propagation. Extensive experiments demonstrate that our AdaViPro yields new efficiency and accuracy trade-offs for adapting pre-trained models.

AdaViPro: Region-based Adaptive Visual Prompt for Large-Scale Models Adapting

TL;DR

This work proposes a region-based Adaptive Visual Prompt, named AdaViPro, which integrates the `where to add' optimization of the prompt into the learning process, and reconceptualizes the `where to add' optimization as a problem of regional decision-making.

Abstract

Recently, prompt-based methods have emerged as a new alternative `parameter-efficient fine-tuning' paradigm, which only fine-tunes a small number of additional parameters while keeping the original model frozen. However, despite achieving notable results, existing prompt methods mainly focus on `what to add', while overlooking the equally important aspect of `where to add', typically relying on the manually crafted placement. To this end, we propose a region-based Adaptive Visual Prompt, named AdaViPro, which integrates the `where to add' optimization of the prompt into the learning process. Specifically, we reconceptualize the `where to add' optimization as a problem of regional decision-making. During inference, AdaViPro generates a regionalized mask map for the whole image, which is composed of 0 and 1, to designate whether to apply or discard the prompt in each specific area. Therefore, we employ Gumbel-Softmax sampling to enable AdaViPro's end-to-end learning through standard back-propagation. Extensive experiments demonstrate that our AdaViPro yields new efficiency and accuracy trade-offs for adapting pre-trained models.
Paper Structure (14 sections, 7 equations, 4 figures, 7 tables)

This paper contains 14 sections, 7 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Visualization of the pixel-level VP vp of 30 and 60 prompt widths. Available information about the label is covered by the fixed-position prompt.
  • Figure 2: Performance of VP vp and AdaViPro on CIFAR10, CIFAR100, DTD and UCF101 across various prompt sizes of {5, 60, 90, 112}.
  • Figure 3: The overall architecture of our AdaViPro, which mainly consists of an edge detector and a mask generator. We retain the original pipeline of VP as stated in Section \ref{['vp']}. Preferably viewed in color.
  • Figure 4: Qualitative examples showing the effectiveness of AdaViPro. 1st column: Raw images serve as references for comparison. 2nd and 4th column: VP with 30 and 60 widths. 3rd and 5th column: Our AdaViPro with 30 and 60 widths.