Table of Contents
Fetching ...

BaCon: Boosting Imbalanced Semi-supervised Learning via Balanced Feature-Level Contrastive Learning

Qianhan Feng, Lujing Xie, Shijie Fang, Tong Lin

TL;DR

This paper addresses the challenge of class imbalance in semi-supervised learning (CISSL), where biased pseudo-labels and uneven class distributions hinder performance. It introduces BaCon, a Balanced Feature-Level Contrastive Learning method that regularizes the feature representations by computing class-wise centers as positives, selecting reliable negatives, and applying a dynamic, class-aware temperature to balance learning. BaCon is designed as a plug-in to existing SSL pipelines (e.g., FixMatch), incorporating memory banks, a projection head, and an auxiliary classifier to achieve state-of-the-art results on long-tail datasets such as CIFAR10-LT, CIFAR100-LT, STL10-LT, and SVHN-LT, while showing robustness under extreme imbalance. The approach emphasizes representation-level alignment to reduce reliance on biased backbone representations, offering practical improvements for real-world imbalanced data scenarios.

Abstract

Semi-supervised Learning (SSL) reduces the need for extensive annotations in deep learning, but the more realistic challenge of imbalanced data distribution in SSL remains largely unexplored. In Class Imbalanced Semi-supervised Learning (CISSL), the bias introduced by unreliable pseudo-labels can be exacerbated by imbalanced data distributions. Most existing methods address this issue at instance-level through reweighting or resampling, but the performance is heavily limited by their reliance on biased backbone representation. Some other methods do perform feature-level adjustments like feature blending but might introduce unfavorable noise. In this paper, we discuss the bonus of a more balanced feature distribution for the CISSL problem, and further propose a Balanced Feature-Level Contrastive Learning method (BaCon). Our method directly regularizes the distribution of instances' representations in a well-designed contrastive manner. Specifically, class-wise feature centers are computed as the positive anchors, while negative anchors are selected by a straightforward yet effective mechanism. A distribution-related temperature adjustment is leveraged to control the class-wise contrastive degrees dynamically. Our method demonstrates its effectiveness through comprehensive experiments on the CIFAR10-LT, CIFAR100-LT, STL10-LT, and SVHN-LT datasets across various settings. For example, BaCon surpasses instance-level method FixMatch-based ABC on CIFAR10-LT with a 1.21% accuracy improvement, and outperforms state-of-the-art feature-level method CoSSL on CIFAR100-LT with a 0.63% accuracy improvement. When encountering more extreme imbalance degree, BaCon also shows better robustness than other methods.

BaCon: Boosting Imbalanced Semi-supervised Learning via Balanced Feature-Level Contrastive Learning

TL;DR

This paper addresses the challenge of class imbalance in semi-supervised learning (CISSL), where biased pseudo-labels and uneven class distributions hinder performance. It introduces BaCon, a Balanced Feature-Level Contrastive Learning method that regularizes the feature representations by computing class-wise centers as positives, selecting reliable negatives, and applying a dynamic, class-aware temperature to balance learning. BaCon is designed as a plug-in to existing SSL pipelines (e.g., FixMatch), incorporating memory banks, a projection head, and an auxiliary classifier to achieve state-of-the-art results on long-tail datasets such as CIFAR10-LT, CIFAR100-LT, STL10-LT, and SVHN-LT, while showing robustness under extreme imbalance. The approach emphasizes representation-level alignment to reduce reliance on biased backbone representations, offering practical improvements for real-world imbalanced data scenarios.

Abstract

Semi-supervised Learning (SSL) reduces the need for extensive annotations in deep learning, but the more realistic challenge of imbalanced data distribution in SSL remains largely unexplored. In Class Imbalanced Semi-supervised Learning (CISSL), the bias introduced by unreliable pseudo-labels can be exacerbated by imbalanced data distributions. Most existing methods address this issue at instance-level through reweighting or resampling, but the performance is heavily limited by their reliance on biased backbone representation. Some other methods do perform feature-level adjustments like feature blending but might introduce unfavorable noise. In this paper, we discuss the bonus of a more balanced feature distribution for the CISSL problem, and further propose a Balanced Feature-Level Contrastive Learning method (BaCon). Our method directly regularizes the distribution of instances' representations in a well-designed contrastive manner. Specifically, class-wise feature centers are computed as the positive anchors, while negative anchors are selected by a straightforward yet effective mechanism. A distribution-related temperature adjustment is leveraged to control the class-wise contrastive degrees dynamically. Our method demonstrates its effectiveness through comprehensive experiments on the CIFAR10-LT, CIFAR100-LT, STL10-LT, and SVHN-LT datasets across various settings. For example, BaCon surpasses instance-level method FixMatch-based ABC on CIFAR10-LT with a 1.21% accuracy improvement, and outperforms state-of-the-art feature-level method CoSSL on CIFAR100-LT with a 0.63% accuracy improvement. When encountering more extreme imbalance degree, BaCon also shows better robustness than other methods.
Paper Structure (21 sections, 15 equations, 4 figures, 4 tables)

This paper contains 21 sections, 15 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Gradient of ABC on representation layer is still biased. In contrast, BaCon provides an extra gradient that narrows the gap between the estimated gradient $g$ and the ideal optimal gradient $\hat{g}$. $r$ is the feature representation.
  • Figure 2: FixMatch-based ABC is trained in two steps as shown in (b) as the comparison of the normally trained one in (a). Remarkable improvement can be observed in (c).
  • Figure 3: Overall training procedure of BaCon. The circle and star filled with squares represent the current representation and the corresponding positive anchor point, respectively. Circles filled with slashes in different colors represent negative instance features belonging to different classes in the current mini-batch.
  • Figure 4: t-SNE visualization results of balanced test set learned by algorithms trained on CIFAR10-LT.