Dropout Prompt Learning: Towards Robust and Adaptive Vision-Language Models

Biao Chen; Lin Zuo; Mengmeng Jing; Kunbin He; Yuchen Wang

Dropout Prompt Learning: Towards Robust and Adaptive Vision-Language Models

Biao Chen, Lin Zuo, Mengmeng Jing, Kunbin He, Yuchen Wang

TL;DR

The paper addresses overfitting in vision-language prompt learning by introducing Dropout Prompt Learning, which applies token-level dropout to both visual and textual branches. It introduces Importance Weighted Token Dropout (IWTD) guided by a multimodal importance metric and couples dropout with Residual Entropy Regularization to preserve cross-modal alignment while fostering diverse representations. The approach is validated across 15 benchmarks, showing improved base-to-novel generalization, few-shot performance, and out-of-distribution robustness, with ablations confirming the critical role of cross-modal attention signals and residual entropy. The method maintains competitive computational efficiency and demonstrates broad applicability across architectures and adapters, highlighting its practical impact for robust, data-efficient VLM adaptation.

Abstract

Dropout is a widely used regularization technique which improves the generalization ability of a model by randomly dropping neurons. In light of this, we propose Dropout Prompt Learning, which aims for applying dropout to improve the robustness of the vision-language models. Different from the vanilla dropout, we apply dropout on the tokens of the textual and visual branches, where we evaluate the token significance considering both intra-modal context and inter-modal alignment, enabling flexible dropout probabilities for each token. Moreover, to maintain semantic alignment for general knowledge transfer while encouraging the diverse representations that dropout introduces, we further propose residual entropy regularization. Experiments on 15 benchmarks show our method's effectiveness in challenging scenarios like low-shot learning, long-tail classification, and out-of-distribution generalization. Notably, our method surpasses regularization-based methods including KgCoOp by 5.10% and PromptSRC by 2.13% in performance on base-to-novel generalization.

Dropout Prompt Learning: Towards Robust and Adaptive Vision-Language Models

TL;DR

Abstract

Dropout Prompt Learning: Towards Robust and Adaptive Vision-Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (2)