Prompt Learning via Meta-Regularization

Jinyoung Park; Juyeon Ko; Hyunwoo J. Kim

Prompt Learning via Meta-Regularization

Jinyoung Park, Juyeon Ko, Hyunwoo J. Kim

TL;DR

ProMetaR addresses prompt overfitting in vision-language models by meta-learning both soft prompts and a gradient-modulated regularizer, augmented with task augmentation to mitigate meta-overfitting. The framework operates as a bi-level optimization with an inner-loop gradient update and an outer-loop update on augmented validation data, incorporating a gradient modulation function $\mathcal{M}^{\boldsymbol{\phi}}$ and a learnable regularizer $\mathcal{R}$. The paper provides a gradient-alignment analysis and demonstrates substantial generalization gains over existing prompt-learning methods across base-to-base, base-to-new, and domain-generalization scenarios on 11 datasets and four ImageNet variants, without external data. ProMetaR is shown to be plug-and-play with various prompting approaches and offers practical impact for data-efficient adaptation ofVision-Language Models.

Abstract

Pre-trained vision-language models have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language models to a variety of downstream tasks. However, most existing prompt learning methods suffer from task overfitting since the general knowledge of the pre-trained vision language models is forgotten while the prompts are finetuned on a small data set from a specific target task. To address this issue, we propose a Prompt Meta-Regularization (ProMetaR) to improve the generalizability of prompt learning for vision-language models. Specifically, ProMetaR meta-learns both the regularizer and the soft prompts to harness the task-specific knowledge from the downstream tasks and task-agnostic general knowledge from the vision-language models. Further, ProMetaR augments the task to generate multiple virtual tasks to alleviate the meta-overfitting. In addition, we provide the analysis to comprehend how ProMetaR improves the generalizability of prompt tuning in the perspective of the gradient alignment. Our extensive experiments demonstrate that our ProMetaR improves the generalizability of conventional prompt learning methods under base-to-base/base-to-new and domain generalization settings. The code of ProMetaR is available at https://github.com/mlvlab/ProMetaR.

Prompt Learning via Meta-Regularization

TL;DR

and a learnable regularizer

. The paper provides a gradient-alignment analysis and demonstrates substantial generalization gains over existing prompt-learning methods across base-to-base, base-to-new, and domain-generalization scenarios on 11 datasets and four ImageNet variants, without external data. ProMetaR is shown to be plug-and-play with various prompting approaches and offers practical impact for data-efficient adaptation ofVision-Language Models.

Abstract

Paper Structure (20 sections, 22 equations, 2 figures, 9 tables)

This paper contains 20 sections, 22 equations, 2 figures, 9 tables.

Introduction
Related works
Method
Preliminaries
Prompt learning via meta-regularization
Analysis of ProMetaR
Experiments
Experimental settings
Effectiveness of ProMetaR
Base-to-base/Base-to-new generalization
Domain generalization
Analysis
Conclusion
Implementation details
Evaluation metrics.
...and 5 more sections

Figures (2)

Figure 1: Performance comparison of ProMetaR with prompt learning methods (Zero-shot CLIP, CoOp, CoCoOp, IVLP (base method), and ProMetaR (Ours)) under the base-to-base/base-to-new setting. We measure average accuracy on the base classes (a) and new classes (b) over 11 datasets. The red dotted line indicates the performance of the zero-shot CLIP.
Figure 2: ProMetaR learns the soft prompts $\Theta = \left\{\boldsymbol{\theta}^{\text{vis}},\boldsymbol{\theta}^{\text{txt}} \right\}$ with meta-regularization to generalize well on the new tasks without losing the generalizability of the pretrained VLMs (e.g., CLIP). In the inner-loop (Eq. \ref{['eq:inner']}), we adapt the soft prompts $\Theta$ with the gradients $\boldsymbol{g}$ of the loss $\mathcal{L}$ and modulated gradients $\boldsymbol{g}_{\text{reg}}=\mathcal{M}^{\boldsymbol{\phi}}\left(\boldsymbol{g}_{\text{reg}};\boldsymbol{g} \right)$. In the outer-loop (Eq. \ref{['eq:outer1']}, \ref{['eq:outer2']}), the soft prompts $\boldsymbol{\Theta}$ and the gradient modulation function $\boldsymbol{\phi}$ are updated on the augmented validation set $D^{\text{val}}$. The image encoder $f$ and text encoder $g$ of the pretrained vision-language models are frozen during the training phase.

Prompt Learning via Meta-Regularization

TL;DR

Abstract

Prompt Learning via Meta-Regularization

Authors

TL;DR

Abstract

Table of Contents

Figures (2)