Learning Optimal Prompt Ensemble for Multi-source Visual Prompt Transfer
Enming Zhang, Liwen Cao, Yanru Wu, Zijie Zhao, Yang Li
TL;DR
We address efficient adaptation of large vision transformers by learning an optimal ensemble over multiple source prompts. HGPrompt introduces a differentiable transferability metric to quantify the informativeness of prompt-induced features and a gradient alignment regularization to suppress cross-prompt interference, optimizing prompt weights under a convex objective. The approach jointly tunes the target prompt and the header, enabling robust, scalable transfer across diverse VTAB tasks. Empirical results on VTAB with ViT-B/16 show state-of-the-art average accuracy and strong performance on fine-grained and reasoning tasks, validating the effectiveness of dynamic multi-source prompt transfer. This framework offers interpretability of source-task relevance and a practical pathway for scalable, privacy-preserving model adaptation.
Abstract
Prompt tuning has emerged as a lightweight strategy for adapting foundation models to downstream tasks, particularly for resource-constrained systems. As pre-trained prompts become valuable assets, combining multiple source prompts offers a promising approach to enhance generalization for new tasks by leveraging complementary knowledge. However, naive aggregation often overlooks different source prompts have different contribution potential to the target task. To address this, we propose HGPrompt, a dynamic framework that learns optimal ensemble weights. These weights are optimized by jointly maximizing an information-theoretic metric for transferability and minimizing gradient conflicts via a novel regularization strategy. Specifically, we propose a differentiable prompt transferability metric to captures the discriminability of prompt-induced features on the target task. Meanwhile, HGPrompt match the gradient variances with respect to different source prompts based on Hessian and Fisher Information, ensuring stable and coherent knowledge transfer while suppressing gradient conflicts among them. Extensive experiments on the large-scale VTAB benchmark demonstrate the state-of-the-art performance of HGPrompt, validating its effectiveness in learning an optimal ensemble for effective multi-source prompt transfer.
