Table of Contents
Fetching ...

Alternating Training-based Label Smoothing Enhances Prompt Generalization

Yang Chen, Yanbin Wei, Ke Jin, Yi Kong, James Kwok, Yu Zhang

TL;DR

This work tackles the limited generalization of prompt tuning in CLIP-based vision–language models by introducing Alternating Training-based Label Smoothing (ATLaS). ATLaS alternates supervision between one-hot labels and soft labels every $K$ epochs, and augments this with offline Class-wise Soft Labels (CSL) and Instance-wise Soft Labels (ISL) derived from textual and multimodal signals to better regularize prompt learning. The authors provide a convergence analysis showing how ATLaS improves optimization relative to vanilla label smoothing, and demonstrate through extensive experiments across cross-dataset, domain, base-to-new, and few-shot settings that ATLaS and its CSL/ISL variants consistently improve generalization when integrated with diverse prompt-tuning baselines. The approach is computationally efficient, with CSL/ISL generation taking negligible time, and exhibits broad compatibility with existing prompt-tuning techniques, making it a practical enhancement for real-world VLM deployment.

Abstract

Recent advances in pre-trained vision-language models have demonstrated remarkable zero-shot generalization capabilities. To further enhance these models' adaptability to various downstream tasks, prompt tuning has emerged as a parameter-efficient fine-tuning method. However, despite its efficiency, the generalization ability of prompt remains limited. In contrast, label smoothing (LS) has been widely recognized as an effective regularization technique that prevents models from becoming over-confident and improves their generalization. This inspires us to explore the integration of LS with prompt tuning. However, we have observed that the vanilla LS even weakens the generalization ability of prompt tuning. To address this issue, we propose the Alternating Training-based Label Smoothing (ATLaS) method, which alternately trains with standard one-hot labels and soft labels generated by LS to supervise the prompt tuning. Moreover, we introduce two types of efficient offline soft labels, including Class-wise Soft Labels (CSL) and Instance-wise Soft Labels (ISL), to provide inter-class or instance-class relationships for prompt tuning. The theoretical properties of the proposed ATLaS method are analyzed. Extensive experiments demonstrate that the proposed ATLaS method, combined with CSL and ISL, consistently enhances the generalization performance of prompt tuning. Moreover, the proposed ATLaS method exhibits high compatibility with prevalent prompt tuning methods, enabling seamless integration into existing methods.

Alternating Training-based Label Smoothing Enhances Prompt Generalization

TL;DR

This work tackles the limited generalization of prompt tuning in CLIP-based vision–language models by introducing Alternating Training-based Label Smoothing (ATLaS). ATLaS alternates supervision between one-hot labels and soft labels every epochs, and augments this with offline Class-wise Soft Labels (CSL) and Instance-wise Soft Labels (ISL) derived from textual and multimodal signals to better regularize prompt learning. The authors provide a convergence analysis showing how ATLaS improves optimization relative to vanilla label smoothing, and demonstrate through extensive experiments across cross-dataset, domain, base-to-new, and few-shot settings that ATLaS and its CSL/ISL variants consistently improve generalization when integrated with diverse prompt-tuning baselines. The approach is computationally efficient, with CSL/ISL generation taking negligible time, and exhibits broad compatibility with existing prompt-tuning techniques, making it a practical enhancement for real-world VLM deployment.

Abstract

Recent advances in pre-trained vision-language models have demonstrated remarkable zero-shot generalization capabilities. To further enhance these models' adaptability to various downstream tasks, prompt tuning has emerged as a parameter-efficient fine-tuning method. However, despite its efficiency, the generalization ability of prompt remains limited. In contrast, label smoothing (LS) has been widely recognized as an effective regularization technique that prevents models from becoming over-confident and improves their generalization. This inspires us to explore the integration of LS with prompt tuning. However, we have observed that the vanilla LS even weakens the generalization ability of prompt tuning. To address this issue, we propose the Alternating Training-based Label Smoothing (ATLaS) method, which alternately trains with standard one-hot labels and soft labels generated by LS to supervise the prompt tuning. Moreover, we introduce two types of efficient offline soft labels, including Class-wise Soft Labels (CSL) and Instance-wise Soft Labels (ISL), to provide inter-class or instance-class relationships for prompt tuning. The theoretical properties of the proposed ATLaS method are analyzed. Extensive experiments demonstrate that the proposed ATLaS method, combined with CSL and ISL, consistently enhances the generalization performance of prompt tuning. Moreover, the proposed ATLaS method exhibits high compatibility with prevalent prompt tuning methods, enabling seamless integration into existing methods.

Paper Structure

This paper contains 35 sections, 7 theorems, 31 equations, 8 figures, 8 tables, 1 algorithm.

Key Result

Lemma 1

$\mathbb{E}_{(\mathbf{x}, \mathbf{\hat{y}})} \left\| \nabla_{\mathbf{v}} F(\mathbf{v}) - \nabla_{\mathbf{v}} \ell(\mathbf{\hat{y}}, f(\mathbf{v}; \mathbf{x})) \right\|^2 \leq \hat{\sigma}^2 = \kappa \sigma^2$ where $\kappa > 0$ is a constant, and $\sigma^2$ is the variance described in Assumption as

Figures (8)

  • Figure 1: Comparison of test accuracies among CoOp coop, CoOp with vanilla label smoothing (LS) ls, TFKD revisitingls, OLS ols, and alternating training-based label smoothing (ATLaS), where prompts are trained on the base classes of ImageNet and subsequently evaluated on new classes.
  • Figure 2: Overview of the proposed ATLaS method. ① ATLaS supervises the prompt tuning process by alternating between one-hot labels and soft labels, where soft label supervision follows every $K-1$ epochs of one-hot label training. ② While the basic ATLaS employs vanilla label smoothing for soft label generation, we further exploit the multimodal properties of CLIP to offer two additional options: class-wise soft labels and instance-wise soft labels.
  • Figure 3: Case study of the soft labels generated by ISL and CSL.
  • Figure 4: Performance of various methods on the 11 datasets under the few-shot classification setting.
  • Figure 5: Comparison of test accuracies for CoOp with different LS strategies on new classes of SUN397.
  • ...and 3 more figures

Theorems & Definitions (7)

  • Lemma 1
  • Theorem 1
  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Lemma 3
  • Theorem 2