Table of Contents
Fetching ...

CDMAD: Class-Distribution-Mismatch-Aware Debiasing for Class-Imbalanced Semi-Supervised Learning

Hyuck Lee, Heeyoung Kim

TL;DR

CDMAD addresses class-imbalance in semi-supervised learning when the unlabeled distribution is unknown or mismatched to the labeled set. It introduces a bias-aware refinement by computing logits on a non-informative input and subtracting this bias from logits on labeled and unlabeled data, thereby refining pseudo-labels and test predictions without adding model complexity. Empirical results across multiple long-tailed datasets show CDMAD consistently outperforms existing CISSL methods, with ablations confirming the contribution of each component. The approach also frames CDMAD as a CISSL extension of post-hoc logit-adjustment and preserves Fisher consistency for minimizing the balanced error, offering a practical, annotation-light path to robust imbalanced SSL.

Abstract

Pseudo-label-based semi-supervised learning (SSL) algorithms trained on a class-imbalanced set face two cascading challenges: 1) Classifiers tend to be biased towards majority classes, and 2) Biased pseudo-labels are used for training. It is difficult to appropriately re-balance the classifiers in SSL because the class distribution of an unlabeled set is often unknown and could be mismatched with that of a labeled set. We propose a novel class-imbalanced SSL algorithm called class-distribution-mismatch-aware debiasing (CDMAD). For each iteration of training, CDMAD first assesses the classifier's biased degree towards each class by calculating the logits on an image without any patterns (e.g., solid color image), which can be considered irrelevant to the training set. CDMAD then refines biased pseudo-labels of the base SSL algorithm by ensuring the classifier's neutrality. CDMAD uses these refined pseudo-labels during the training of the base SSL algorithm to improve the quality of the representations. In the test phase, CDMAD similarly refines biased class predictions on test samples. CDMAD can be seen as an extension of post-hoc logit adjustment to address a challenge of incorporating the unknown class distribution of the unlabeled set for re-balancing the biased classifier under class distribution mismatch. CDMAD ensures Fisher consistency for the balanced error. Extensive experiments verify the effectiveness of CDMAD.

CDMAD: Class-Distribution-Mismatch-Aware Debiasing for Class-Imbalanced Semi-Supervised Learning

TL;DR

CDMAD addresses class-imbalance in semi-supervised learning when the unlabeled distribution is unknown or mismatched to the labeled set. It introduces a bias-aware refinement by computing logits on a non-informative input and subtracting this bias from logits on labeled and unlabeled data, thereby refining pseudo-labels and test predictions without adding model complexity. Empirical results across multiple long-tailed datasets show CDMAD consistently outperforms existing CISSL methods, with ablations confirming the contribution of each component. The approach also frames CDMAD as a CISSL extension of post-hoc logit-adjustment and preserves Fisher consistency for minimizing the balanced error, offering a practical, annotation-light path to robust imbalanced SSL.

Abstract

Pseudo-label-based semi-supervised learning (SSL) algorithms trained on a class-imbalanced set face two cascading challenges: 1) Classifiers tend to be biased towards majority classes, and 2) Biased pseudo-labels are used for training. It is difficult to appropriately re-balance the classifiers in SSL because the class distribution of an unlabeled set is often unknown and could be mismatched with that of a labeled set. We propose a novel class-imbalanced SSL algorithm called class-distribution-mismatch-aware debiasing (CDMAD). For each iteration of training, CDMAD first assesses the classifier's biased degree towards each class by calculating the logits on an image without any patterns (e.g., solid color image), which can be considered irrelevant to the training set. CDMAD then refines biased pseudo-labels of the base SSL algorithm by ensuring the classifier's neutrality. CDMAD uses these refined pseudo-labels during the training of the base SSL algorithm to improve the quality of the representations. In the test phase, CDMAD similarly refines biased class predictions on test samples. CDMAD can be seen as an extension of post-hoc logit adjustment to address a challenge of incorporating the unknown class distribution of the unlabeled set for re-balancing the biased classifier under class distribution mismatch. CDMAD ensures Fisher consistency for the balanced error. Extensive experiments verify the effectiveness of CDMAD.
Paper Structure (28 sections, 1 theorem, 17 equations, 11 figures, 17 tables)

This paper contains 28 sections, 1 theorem, 17 equations, 11 figures, 17 tables.

Key Result

Proposition 1

Given a solid color image $\mathcal{I}$ independent of class labels $y$, the refinement by CDMAD in eqtest is Fisher consistent for minimizing the BER in ber.

Figures (11)

  • Figure 1: Class probabilities on an image without any patterns.
  • Figure 2: Pseudo-label refinement process using CDMAD.
  • Figure 3: (a) and (b) present the class probabilities predicted on a white image using the proposed algorithm. (c) and (d) present the confusion matrices of the class predictions on test samples.
  • Figure 4: Code for refining pseudo-labels using CDMAD
  • Figure 5: Example images augmented using each data augmentation technique
  • ...and 6 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof