Table of Contents
Fetching ...

Erasing the Bias: Fine-Tuning Foundation Models for Semi-Supervised Learning

Kai Gan, Tong Wei

TL;DR

This paper identifies the aggregated biases and cognitive deviation problems inherent in foundation models, and proposes a simple yet effective solution by imposing balanced margin softmax and decoupled label smoothing.

Abstract

Semi-supervised learning (SSL) has witnessed remarkable progress, resulting in the emergence of numerous method variations. However, practitioners often encounter challenges when attempting to deploy these methods due to their subpar performance. In this paper, we present a novel SSL approach named FineSSL that significantly addresses this limitation by adapting pre-trained foundation models. We identify the aggregated biases and cognitive deviation problems inherent in foundation models, and propose a simple yet effective solution by imposing balanced margin softmax and decoupled label smoothing. Through extensive experiments, we demonstrate that FineSSL sets a new state of the art for SSL on multiple benchmark datasets, reduces the training cost by over six times, and can seamlessly integrate various fine-tuning and modern SSL algorithms. The source code is available at https://github.com/Gank0078/FineSSL.

Erasing the Bias: Fine-Tuning Foundation Models for Semi-Supervised Learning

TL;DR

This paper identifies the aggregated biases and cognitive deviation problems inherent in foundation models, and proposes a simple yet effective solution by imposing balanced margin softmax and decoupled label smoothing.

Abstract

Semi-supervised learning (SSL) has witnessed remarkable progress, resulting in the emergence of numerous method variations. However, practitioners often encounter challenges when attempting to deploy these methods due to their subpar performance. In this paper, we present a novel SSL approach named FineSSL that significantly addresses this limitation by adapting pre-trained foundation models. We identify the aggregated biases and cognitive deviation problems inherent in foundation models, and propose a simple yet effective solution by imposing balanced margin softmax and decoupled label smoothing. Through extensive experiments, we demonstrate that FineSSL sets a new state of the art for SSL on multiple benchmark datasets, reduces the training cost by over six times, and can seamlessly integrate various fine-tuning and modern SSL algorithms. The source code is available at https://github.com/Gank0078/FineSSL.
Paper Structure (30 sections, 8 equations, 8 figures, 12 tables, 1 algorithm)

This paper contains 30 sections, 8 equations, 8 figures, 12 tables, 1 algorithm.

Figures (8)

  • Figure 1: Left: Fine-tuning pre-trained ViT significantly outperforms training Wide ResNet starting from scratch. Right: VPT improves full fine-tuning and linear probing by a large margin. Experiments are conducted on CIFAR-100 using FixMatch. Throughout the paper, we denote the setting with 4 labeled samples for each class as "N4", and other settings are defined accordingly.
  • Figure 2: Left: The distribution of pseudo-labels for unlabeled data. Classes are sorted by the frequencies of pseudo-labels for each class. Right: The average confidence across settings with different numbers of labeled samples per class based on FixMatch. "DLS" denotes the decoupled label smoothing proposed in \ref{['method:dls']}. Experiments are conducted on CIFAR-100.
  • Figure 3: (\ref{['fig:ulab_acc', 'fig:ulab_ent']}): The accuracy and entropy of pseudo-labels for FixMatch, FlexMatch, DebiasPL, and FineSSL on CIFAR100 with 4 labeled data per class. (\ref{['fig:sen_lsm', 'fig:sen_alpha']}): The sensitivity of $\lambda$ and $\alpha_0$ under various settings on CIFAR-100.
  • Figure 4: (\ref{['fig:ood_acc']}): The accuracy for N5 and N10 OpenSSL setting on CIFAR-100 for different methods. (\ref{['fig:ood1', 'fig:ood2', 'fig:ood3']}): The distribution of confidence score for ID and OOD samples of FixMatch, FineSSL w/o DLS, and FineSSL for N5 OpenSSL setting.
  • Figure 5: (\ref{['fig:sen_gamma', 'fig:sen_vptlen', 'fig:sen_epochs']}): The sensitivity for $\gamma$, prompt length and training epochs under various settings on CIFAR-100.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Definition 4.1