Table of Contents
Fetching ...

PRO-VPT: Distribution-Adaptive Visual Prompt Tuning via Prompt Relocation

Chikai Shang, Mengke Li, Yiqun Zhang, Zhen Chen, Jinlin Wu, Fangqing Gu, Yang Lu, Yiu-ming Cheung

TL;DR

PRO-VPT tackles the limitation of fixed prompt distributions in visual prompt tuning by formulating adaptive distribution optimization (ADO) and integrating it with VPT in a nested optimization framework. It introduces a two-step prompt relocation (PR) procedure—pruning idle prompts using an idleness score and allocating them to the most needed blocks via PPO-based reinforcement learning—to learn task-specific prompt distributions across Transformer blocks. The method formalizes the co-design as a nested problem, with $\\mathcal{D}^* = \arg\min_{\mathcal{D}} \mathbb{E}_{(\mathbf{x}, y) \in \mathcal{T}_{tr}}[\mathcal{L}(f_{\mathbf{P}^*, \mathcal{D}}(\mathbf{x}), y)]$ and $\mathbf{P}^* = \arg\min_{\mathbf{P}} \mathbb{E}_{(\mathbf{x}, y) \in \mathcal{T}_{tr}}[\mathcal{L}(f_{\mathbf{P}, \mathcal{D}^*}(\mathbf{x}), y)]$. Empirically, PRO-VPT achieves state-of-the-art average accuracies on VTAB-1k (78.0%) and FGVC (91.7%), outperforming VPT-Deep by 1.6 and other prompt-based methods by up to ~2 percentage points, and demonstrates strong generalization across backbones and pre-training strategies. The approach offers a practical, parameter-efficient path to task-specific vision adaptation with improved stability and robustness compared to static prompting.

Abstract

Visual prompt tuning (VPT), i.e., fine-tuning some lightweight prompt tokens, provides an efficient and effective approach for adapting pre-trained models to various downstream tasks. However, most prior art indiscriminately uses a fixed prompt distribution across different tasks, neglecting the importance of each block varying depending on the task. In this paper, we introduce adaptive distribution optimization (ADO) by tackling two key questions: (1) How to appropriately and formally define ADO, and (2) How to design an adaptive distribution strategy guided by this definition? Through empirical analysis, we first confirm that properly adjusting the distribution significantly improves VPT performance, and further uncover a key insight that a nested relationship exists between ADO and VPT. Based on these findings, we propose a new VPT framework, termed PRO-VPT (iterative Prompt RelOcation-based VPT), which adaptively adjusts the distribution built upon a nested optimization formulation. Specifically, we develop a prompt relocation strategy derived from this formulation, comprising two steps: pruning idle prompts from prompt-saturated blocks, followed by allocating these prompts to the most prompt-needed blocks. By iteratively performing prompt relocation and VPT, our proposal can adaptively learn the optimal prompt distribution in a nested optimization-based manner, thereby unlocking the full potential of VPT. Extensive experiments demonstrate that our proposal significantly outperforms advanced VPT methods, e.g., PRO-VPT surpasses VPT by 1.6 pp and 2.0 pp average accuracy, leading prompt-based methods to state-of-the-art performance on VTAB-1k and FGVC benchmarks. The code is available at https://github.com/ckshang/PRO-VPT.

PRO-VPT: Distribution-Adaptive Visual Prompt Tuning via Prompt Relocation

TL;DR

PRO-VPT tackles the limitation of fixed prompt distributions in visual prompt tuning by formulating adaptive distribution optimization (ADO) and integrating it with VPT in a nested optimization framework. It introduces a two-step prompt relocation (PR) procedure—pruning idle prompts using an idleness score and allocating them to the most needed blocks via PPO-based reinforcement learning—to learn task-specific prompt distributions across Transformer blocks. The method formalizes the co-design as a nested problem, with and . Empirically, PRO-VPT achieves state-of-the-art average accuracies on VTAB-1k (78.0%) and FGVC (91.7%), outperforming VPT-Deep by 1.6 and other prompt-based methods by up to ~2 percentage points, and demonstrates strong generalization across backbones and pre-training strategies. The approach offers a practical, parameter-efficient path to task-specific vision adaptation with improved stability and robustness compared to static prompting.

Abstract

Visual prompt tuning (VPT), i.e., fine-tuning some lightweight prompt tokens, provides an efficient and effective approach for adapting pre-trained models to various downstream tasks. However, most prior art indiscriminately uses a fixed prompt distribution across different tasks, neglecting the importance of each block varying depending on the task. In this paper, we introduce adaptive distribution optimization (ADO) by tackling two key questions: (1) How to appropriately and formally define ADO, and (2) How to design an adaptive distribution strategy guided by this definition? Through empirical analysis, we first confirm that properly adjusting the distribution significantly improves VPT performance, and further uncover a key insight that a nested relationship exists between ADO and VPT. Based on these findings, we propose a new VPT framework, termed PRO-VPT (iterative Prompt RelOcation-based VPT), which adaptively adjusts the distribution built upon a nested optimization formulation. Specifically, we develop a prompt relocation strategy derived from this formulation, comprising two steps: pruning idle prompts from prompt-saturated blocks, followed by allocating these prompts to the most prompt-needed blocks. By iteratively performing prompt relocation and VPT, our proposal can adaptively learn the optimal prompt distribution in a nested optimization-based manner, thereby unlocking the full potential of VPT. Extensive experiments demonstrate that our proposal significantly outperforms advanced VPT methods, e.g., PRO-VPT surpasses VPT by 1.6 pp and 2.0 pp average accuracy, leading prompt-based methods to state-of-the-art performance on VTAB-1k and FGVC benchmarks. The code is available at https://github.com/ckshang/PRO-VPT.

Paper Structure

This paper contains 24 sections, 18 equations, 16 figures, 12 tables, 1 algorithm.

Figures (16)

  • Figure 1: PRO-VPT (ours) vs. prior art in VPT. Existing VPT approaches typically insert trainable prompts into the PVM with a pre-specific static distribution, whether shallow or deep, and optimize these prompts to drive the PVM to conduct downstream tasks. Compared to the prior art, our proposal (PRO-VPT) adaptively adjusts prompt distribution by treating it as an optimization objective and coupling the distribution optimization with prompt tuning.
  • Figure 2: Performance gaps from distribution adjustments using prompts from epochs 25, 50, and 75. Adjusting prompt distribution appropriately leads to enhanced performance; however, effective adjustments vary significantly across different epochs.
  • Figure 3: Overview of our proposed PRO-VPT. Left: The streamlined workflow of PRO-VPT. Right: Illustration of the PR process.
  • Figure 4: Performance gains achieved by VPT w/ ADO (PRO-VPT) compared to VPT w/o ADO (VPT-Deep). PRO-VPT consistently outperforms VPT-Deep.
  • Figure 5: Visualization of prompt distributions learned by PRO-VPT on VTAB-1k Natural Cifar100, Specialized Resisc45, and Structured Clevr-Count with varying numbers of prompts. PRO-VPT effectively learns task-specific distributions.
  • ...and 11 more figures