Semantic Hierarchical Prompt Tuning for Parameter-Efficient Fine-Tuning
Haowei Zhu, Fangyuan Zhang, Rui Qin, Tianxiang Pan, Junhai Yong, Bin Wang
TL;DR
This work tackles the inefficiency of full fine-tuning by introducing SHIP, a parameter-efficient fine-tuning method that leverages semantic hierarchies within pre-trained Vision Transformers. SHIP constructs semantic levels from inter-layer affinities, assigns three prompt types (Semantic Independent Prompts, Semantic Shared Prompts, and Attribute Prompts), and enforces discriminative learning through a Prompt Matching Loss and Decoupled Attention to preserve pre-trained attention. The approach yields consistent improvements over VPT and competitive PEFT methods on VTAB-1k without extensively increasing trainable parameters. Overall, SHIP provides a robust, scalable strategy for task-specific fine-tuning that enhances feature aggregation and discrimination while maintaining efficiency.
Abstract
As the scale of vision models continues to grow, Visual Prompt Tuning (VPT) has emerged as a parameter-efficient transfer learning technique, noted for its superior performance compared to full fine-tuning. However, indiscriminately applying prompts to every layer without considering their inherent correlations, can cause significant disturbances, leading to suboptimal transferability. Additionally, VPT disrupts the original self-attention structure, affecting the aggregation of visual features, and lacks a mechanism for explicitly mining discriminative visual features, which are crucial for classification. To address these issues, we propose a Semantic Hierarchical Prompt (SHIP) fine-tuning strategy. We adaptively construct semantic hierarchies and use semantic-independent and semantic-shared prompts to learn hierarchical representations. We also integrate attribute prompts and a prompt matching loss to enhance feature discrimination and employ decoupled attention for robustness and reduced inference costs. SHIP significantly improves performance, achieving a 4.9% gain in accuracy over VPT with a ViT-B/16 backbone on VTAB-1k tasks. Our code is available at https://github.com/haoweiz23/SHIP.
