Table of Contents
Fetching ...

Flexible Distribution Alignment: Towards Long-tailed Semi-supervised Learning with Proper Calibration

Emanuel Sanchez Aimar, Nathaniel Helgesen, Yonghao Xu, Marco Kuhlmann, Michael Felsberg

TL;DR

This work tackles long-tailed semi-supervised learning under label shift by introducing ADELLO, a framework that combines Flexible Distribution Alignment (FlexDA) with Complementary Consistency Regularization (CCR). FlexDA dynamically aligns the classifier to the unknown unlabeled data distribution using a time-smoothed target prior ${\hat Q}_{\alpha_t}$ and logit-adjusted losses that evolve toward a balanced prior, improving data utilization and debiasing during training. CCR exploits low-confidence pseudo-labels via masked distillation at a controlled temperature, enabling broader data usage and mitigating confirmation bias. Across CIFAR-LT, STL10-LT, and ImageNet127, ADELLO delivers state-of-the-art LTSSL performance and substantially better calibration (lower ECE/MCE), demonstrating robustness to various label-shift scenarios and practical impact for scalable, well-calibrated semi-supervised learning. The method preserves computational efficiency by not adding forward passes or extra classifiers, making it appealing for real-world LTSSL deployments.

Abstract

Long-tailed semi-supervised learning (LTSSL) represents a practical scenario for semi-supervised applications, challenged by skewed labeled distributions that bias classifiers. This problem is often aggravated by discrepancies between labeled and unlabeled class distributions, leading to biased pseudo-labels, neglect of rare classes, and poorly calibrated probabilities. To address these issues, we introduce Flexible Distribution Alignment (FlexDA), a novel adaptive logit-adjusted loss framework designed to dynamically estimate and align predictions with the actual distribution of unlabeled data and achieve a balanced classifier by the end of training. FlexDA is further enhanced by a distillation-based consistency loss, promoting fair data usage across classes and effectively leveraging underconfident samples. This method, encapsulated in ADELLO (Align and Distill Everything All at Once), proves robust against label shift, significantly improves model calibration in LTSSL contexts, and surpasses previous state-of-of-art approaches across multiple benchmarks, including CIFAR100-LT, STL10-LT, and ImageNet127, addressing class imbalance challenges in semi-supervised learning. Our code is available at https://github.com/emasa/ADELLO-LTSSL.

Flexible Distribution Alignment: Towards Long-tailed Semi-supervised Learning with Proper Calibration

TL;DR

This work tackles long-tailed semi-supervised learning under label shift by introducing ADELLO, a framework that combines Flexible Distribution Alignment (FlexDA) with Complementary Consistency Regularization (CCR). FlexDA dynamically aligns the classifier to the unknown unlabeled data distribution using a time-smoothed target prior and logit-adjusted losses that evolve toward a balanced prior, improving data utilization and debiasing during training. CCR exploits low-confidence pseudo-labels via masked distillation at a controlled temperature, enabling broader data usage and mitigating confirmation bias. Across CIFAR-LT, STL10-LT, and ImageNet127, ADELLO delivers state-of-the-art LTSSL performance and substantially better calibration (lower ECE/MCE), demonstrating robustness to various label-shift scenarios and practical impact for scalable, well-calibrated semi-supervised learning. The method preserves computational efficiency by not adding forward passes or extra classifiers, making it appealing for real-world LTSSL deployments.

Abstract

Long-tailed semi-supervised learning (LTSSL) represents a practical scenario for semi-supervised applications, challenged by skewed labeled distributions that bias classifiers. This problem is often aggravated by discrepancies between labeled and unlabeled class distributions, leading to biased pseudo-labels, neglect of rare classes, and poorly calibrated probabilities. To address these issues, we introduce Flexible Distribution Alignment (FlexDA), a novel adaptive logit-adjusted loss framework designed to dynamically estimate and align predictions with the actual distribution of unlabeled data and achieve a balanced classifier by the end of training. FlexDA is further enhanced by a distillation-based consistency loss, promoting fair data usage across classes and effectively leveraging underconfident samples. This method, encapsulated in ADELLO (Align and Distill Everything All at Once), proves robust against label shift, significantly improves model calibration in LTSSL contexts, and surpasses previous state-of-of-art approaches across multiple benchmarks, including CIFAR100-LT, STL10-LT, and ImageNet127, addressing class imbalance challenges in semi-supervised learning. Our code is available at https://github.com/emasa/ADELLO-LTSSL.
Paper Structure (18 sections, 11 equations, 8 figures, 14 tables, 1 algorithm)

This paper contains 18 sections, 11 equations, 8 figures, 14 tables, 1 algorithm.

Figures (8)

  • Figure 1: Long-tailed semi-supervised learning considers a challenging scenario, where a labeled dataset with a skewed class distribution, $\mathcal{P}_L(y)$, see (a), can bias the model towards frequent classes. This challenge is exacerbated by the use of a larger unlabeled dataset with an unknown class distribution, $\mathcal{Q}(y)$, see (b), risking the reinforcement of data biases, which in turn leads to uncalibrated probabilities. Our evaluation focuses on misclassification error (complement of accuracy) and expected calibration error to test generalization and calibration. Our approach shows consistent improvements in both respects, as shown in (c) and (d).
  • Figure 2: Method overview: Our flexible distribution alignment (FlexDA) aligns the classifier with the correct prior, dynamically estimated from unlabeled data. This approach extends FixMatch with a bias-adjusted supervised loss ((\ref{['eq:flexda_s']}), Sec. \ref{['align']}) and a bias-adjusted consistency loss ((\ref{['eq:flexda_u']}), Sec. \ref{['align']}) to debias high-confidence hard pseudo-labels. We also introduce a bias-adjusted complementary consistency loss to learn from low-confidence soft pseudo-labels (Sec. \ref{['distill']}). A progressive scheduler steadily smooths the target prior, $\mathcal{\hat{Q}}_{\alpha_t}$, leading to a balanced classifier by the conclusion of training.
  • Figure 3: Prior estimation under label shift. A comparison of KL divergence shows 1) a small difference between the estimated prior, $\hat{\mathcal{Q}}$, and the ground-truth prior, $\mathcal{Q}$, during most of the training (blue curve), and 2) a larger disparity between $\hat{\mathcal{Q}}$ and the uniform prior, $\mathcal{P}_{\text{bal}}$, (orange curve). The progression of a quadratic scheduler ($d = 2$) is shown in (d) (green curve). Label shift settings: (a) forward, (b) balanced, and (c) reversed long-tailed, computed for CIFAR10-LT100.
  • Figure 3: Test accuracy (%) on CIFAR{10,100}-LT and STL10-LT under low-label regimes. $\dagger$: labeled prior as target. $\ddagger$: results from prior work oh2021daso. Best scores bold, second-best underlined.
  • Figure 4: Varying label shift.
  • ...and 3 more figures