Table of Contents
Fetching ...

FATE: A Prompt-Tuning-Based Semi-Supervised Learning Framework for Extremely Limited Labeled Data

Hezhao Liu, Yang Lu, Mengke Li, Yiqun Zhang, Shreyank N Gowda, Chen Gong, Hanzi Wang

TL;DR

FATE targets semi-supervised learning under extreme label scarcity by decoupling adaptation and classification into a two-stage prompt-tuning process. It first learns Distribution-adaptive Prompts $P_d$ from unlabeled data to align the backbone with downstream distributions, then uses Classification Prompts $P_c$ in a refined SSL objective to leverage both labeled and unlabeled data for final classification. The framework is shown to be effective for both vision and vision-language models, delivering substantial gains over state-of-the-art SSL and PEFT methods across multiple benchmarks. This approach enables robust SSL performance with minimal labeled data, highlighting the value of prompt-tuning and distribution-aware adaptation in settings with scarce supervision.

Abstract

Semi-supervised learning (SSL) has achieved significant progress by leveraging both labeled data and unlabeled data. Existing SSL methods overlook a common real-world scenario when labeled data is extremely scarce, potentially as limited as a single labeled sample in the dataset. General SSL approaches struggle to train effectively from scratch under such constraints, while methods utilizing pre-trained models often fail to find an optimal balance between leveraging limited labeled data and abundant unlabeled data. To address this challenge, we propose Firstly Adapt, Then catEgorize (FATE), a novel SSL framework tailored for scenarios with extremely limited labeled data. At its core, the two-stage prompt tuning paradigm FATE exploits unlabeled data to compensate for scarce supervision signals, then transfers to downstream tasks. Concretely, FATE first adapts a pre-trained model to the feature distribution of downstream data using volumes of unlabeled samples in an unsupervised manner. It then applies an SSL method specifically designed for pre-trained models to complete the final classification task. FATE is designed to be compatible with both vision and vision-language pre-trained models. Extensive experiments demonstrate that FATE effectively mitigates challenges arising from the scarcity of labeled samples in SSL, achieving an average performance improvement of 33.74% across seven benchmarks compared to state-of-the-art SSL methods. Code is available at https://anonymous.4open.science/r/Semi-supervised-learning-BA72.

FATE: A Prompt-Tuning-Based Semi-Supervised Learning Framework for Extremely Limited Labeled Data

TL;DR

FATE targets semi-supervised learning under extreme label scarcity by decoupling adaptation and classification into a two-stage prompt-tuning process. It first learns Distribution-adaptive Prompts from unlabeled data to align the backbone with downstream distributions, then uses Classification Prompts in a refined SSL objective to leverage both labeled and unlabeled data for final classification. The framework is shown to be effective for both vision and vision-language models, delivering substantial gains over state-of-the-art SSL and PEFT methods across multiple benchmarks. This approach enables robust SSL performance with minimal labeled data, highlighting the value of prompt-tuning and distribution-aware adaptation in settings with scarce supervision.

Abstract

Semi-supervised learning (SSL) has achieved significant progress by leveraging both labeled data and unlabeled data. Existing SSL methods overlook a common real-world scenario when labeled data is extremely scarce, potentially as limited as a single labeled sample in the dataset. General SSL approaches struggle to train effectively from scratch under such constraints, while methods utilizing pre-trained models often fail to find an optimal balance between leveraging limited labeled data and abundant unlabeled data. To address this challenge, we propose Firstly Adapt, Then catEgorize (FATE), a novel SSL framework tailored for scenarios with extremely limited labeled data. At its core, the two-stage prompt tuning paradigm FATE exploits unlabeled data to compensate for scarce supervision signals, then transfers to downstream tasks. Concretely, FATE first adapts a pre-trained model to the feature distribution of downstream data using volumes of unlabeled samples in an unsupervised manner. It then applies an SSL method specifically designed for pre-trained models to complete the final classification task. FATE is designed to be compatible with both vision and vision-language pre-trained models. Extensive experiments demonstrate that FATE effectively mitigates challenges arising from the scarcity of labeled samples in SSL, achieving an average performance improvement of 33.74% across seven benchmarks compared to state-of-the-art SSL methods. Code is available at https://anonymous.4open.science/r/Semi-supervised-learning-BA72.

Paper Structure

This paper contains 24 sections, 11 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Performance of SSL and few-shot algorithms under different amounts of labeled data $k$ for each class. Generally, the performance degrades as the number of labeled data decreases.
  • Figure 2: Implementation of FATE on the vision model. Firstly, we concatenate the DP to the embedding vectors of the unlabeled samples and optimize it with contrastive learning loss. Then we modify FixMatch by first concatenating learnable CP to all embedding vectors, fixing the DP just learned, and concatenating it to the branches of the weak augmented views of both labeled and unlabeled samples. Finally, the mean of the CP is used as the classification feature.
  • Figure 3: Implementation of FATE on the vision-language model. We use the original CLIP's zero-shot capability to pseudo-label the unlabeled data, selecting the top-$k$ samples with the highest predicted values for each class and training the DP for the visual encoder with the pseudo-labeled samples. Then we fix the DP just learned and design CP at the textual encoder side, optimizing it using \ref{['classstageloss']}.
  • Figure 4: The t-SNE visualization of FATE's implementation on the vision model for the CIFAR-10 test set. With the inclusion of DP/CP, the features of data points belonging to the same class cluster together, whereas the original features without DP/CP remain dispersed.
  • Figure 5: Test accuracy of FATE implemented on the vision language model on four datasets with different $k$. We performed ablation experiments on DP and CP. The model performance with DP is already higher than the original CLIP zero-shot, and with CP, the model performance can be further improved. As $k$ increases to a certain extent, the model's performance tends to converge or decrease.