Table of Contents
Fetching ...

Prompt Learning via Meta-Regularization

Jinyoung Park, Juyeon Ko, Hyunwoo J. Kim

TL;DR

ProMetaR addresses prompt overfitting in vision-language models by meta-learning both soft prompts and a gradient-modulated regularizer, augmented with task augmentation to mitigate meta-overfitting. The framework operates as a bi-level optimization with an inner-loop gradient update and an outer-loop update on augmented validation data, incorporating a gradient modulation function $\mathcal{M}^{\boldsymbol{\phi}}$ and a learnable regularizer $\mathcal{R}$. The paper provides a gradient-alignment analysis and demonstrates substantial generalization gains over existing prompt-learning methods across base-to-base, base-to-new, and domain-generalization scenarios on 11 datasets and four ImageNet variants, without external data. ProMetaR is shown to be plug-and-play with various prompting approaches and offers practical impact for data-efficient adaptation ofVision-Language Models.

Abstract

Pre-trained vision-language models have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language models to a variety of downstream tasks. However, most existing prompt learning methods suffer from task overfitting since the general knowledge of the pre-trained vision language models is forgotten while the prompts are finetuned on a small data set from a specific target task. To address this issue, we propose a Prompt Meta-Regularization (ProMetaR) to improve the generalizability of prompt learning for vision-language models. Specifically, ProMetaR meta-learns both the regularizer and the soft prompts to harness the task-specific knowledge from the downstream tasks and task-agnostic general knowledge from the vision-language models. Further, ProMetaR augments the task to generate multiple virtual tasks to alleviate the meta-overfitting. In addition, we provide the analysis to comprehend how ProMetaR improves the generalizability of prompt tuning in the perspective of the gradient alignment. Our extensive experiments demonstrate that our ProMetaR improves the generalizability of conventional prompt learning methods under base-to-base/base-to-new and domain generalization settings. The code of ProMetaR is available at https://github.com/mlvlab/ProMetaR.

Prompt Learning via Meta-Regularization

TL;DR

ProMetaR addresses prompt overfitting in vision-language models by meta-learning both soft prompts and a gradient-modulated regularizer, augmented with task augmentation to mitigate meta-overfitting. The framework operates as a bi-level optimization with an inner-loop gradient update and an outer-loop update on augmented validation data, incorporating a gradient modulation function and a learnable regularizer . The paper provides a gradient-alignment analysis and demonstrates substantial generalization gains over existing prompt-learning methods across base-to-base, base-to-new, and domain-generalization scenarios on 11 datasets and four ImageNet variants, without external data. ProMetaR is shown to be plug-and-play with various prompting approaches and offers practical impact for data-efficient adaptation ofVision-Language Models.

Abstract

Pre-trained vision-language models have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language models to a variety of downstream tasks. However, most existing prompt learning methods suffer from task overfitting since the general knowledge of the pre-trained vision language models is forgotten while the prompts are finetuned on a small data set from a specific target task. To address this issue, we propose a Prompt Meta-Regularization (ProMetaR) to improve the generalizability of prompt learning for vision-language models. Specifically, ProMetaR meta-learns both the regularizer and the soft prompts to harness the task-specific knowledge from the downstream tasks and task-agnostic general knowledge from the vision-language models. Further, ProMetaR augments the task to generate multiple virtual tasks to alleviate the meta-overfitting. In addition, we provide the analysis to comprehend how ProMetaR improves the generalizability of prompt tuning in the perspective of the gradient alignment. Our extensive experiments demonstrate that our ProMetaR improves the generalizability of conventional prompt learning methods under base-to-base/base-to-new and domain generalization settings. The code of ProMetaR is available at https://github.com/mlvlab/ProMetaR.
Paper Structure (20 sections, 22 equations, 2 figures, 9 tables)

This paper contains 20 sections, 22 equations, 2 figures, 9 tables.

Figures (2)

  • Figure 1: Performance comparison of ProMetaR with prompt learning methods (Zero-shot CLIP, CoOp, CoCoOp, IVLP (base method), and ProMetaR (Ours)) under the base-to-base/base-to-new setting. We measure average accuracy on the base classes (a) and new classes (b) over 11 datasets. The red dotted line indicates the performance of the zero-shot CLIP.
  • Figure 2: ProMetaR learns the soft prompts $\Theta = \left\{\boldsymbol{\theta}^{\text{vis}},\boldsymbol{\theta}^{\text{txt}} \right\}$ with meta-regularization to generalize well on the new tasks without losing the generalizability of the pretrained VLMs (e.g., CLIP). In the inner-loop (Eq. \ref{['eq:inner']}), we adapt the soft prompts $\Theta$ with the gradients $\boldsymbol{g}$ of the loss $\mathcal{L}$ and modulated gradients $\boldsymbol{g}_{\text{reg}}=\mathcal{M}^{\boldsymbol{\phi}}\left(\boldsymbol{g}_{\text{reg}};\boldsymbol{g} \right)$. In the outer-loop (Eq. \ref{['eq:outer1']}, \ref{['eq:outer2']}), the soft prompts $\boldsymbol{\Theta}$ and the gradient modulation function $\boldsymbol{\phi}$ are updated on the augmented validation set $D^{\text{val}}$. The image encoder $f$ and text encoder $g$ of the pretrained vision-language models are frozen during the training phase.