Diversity Covariance-Aware Prompt Learning for Vision-Language Models

Songlin Dong; Zhengdong Zhou; Chenhao Ding; Xinyuan Gao; Alex Kot; Yihong Gong

Diversity Covariance-Aware Prompt Learning for Vision-Language Models

Songlin Dong, Zhengdong Zhou, Chenhao Ding, Xinyuan Gao, Alex Kot, Yihong Gong

TL;DR

The paper addresses the challenge of adapting large vision-language models to few-shot tasks by recognizing that feature distributions become non-isotropic with limited data. It introduces the Diversity Covariance-Aware (DCA) framework, combining covariance-aware modeling with anisotropic Mahalanobis distance and diversity-aware prompts to capture multi-faceted category attributes. The authors derive a theoretically grounded classifier, implement covariance shrinkage for stability, and optimize multiple promiscuous prompts with text separation to improve generalization. Across 11 datasets and in domain-generalization scenarios, DCA yields substantial performance gains over zero-shot CLIP and contemporary prompt-tuning methods, highlighting the practical value of distribution-aware prompt learning for real-world few-shot applications.

Abstract

Prompt tuning can further enhance the performance of visual-language models across various downstream tasks (e.g., few-shot learning), enabling them to better adapt to specific applications and needs. In this paper, we present a Diversity Covariance-Aware framework that learns distributional information from the data to enhance the few-shot ability of the prompt model. First, we propose a covariance-aware method that models the covariance relationships between visual features and uses anisotropic Mahalanobis distance, instead of the suboptimal cosine distance, to measure the similarity between two modalities. We rigorously derive and prove the validity of this modeling process. Then, we propose the diversity-aware method, which learns multiple diverse soft prompts to capture different attributes of categories and aligns them independently with visual modalities. This method achieves multi-centered covariance modeling, leading to more diverse decision boundaries. Extensive experiments on 11 datasets in various tasks demonstrate the effectiveness of our method.

Diversity Covariance-Aware Prompt Learning for Vision-Language Models

TL;DR

Abstract

Diversity Covariance-Aware Prompt Learning for Vision-Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)