Leveraging Label Proportion Prior for Class-Imbalanced Semi-Supervised Learning

Kohki Akiba; Shinnosuke Matsuo; Shota Harada; Ryoma Bise

Leveraging Label Proportion Prior for Class-Imbalanced Semi-Supervised Learning

Kohki Akiba, Shinnosuke Matsuo, Shota Harada, Ryoma Bise

TL;DR

This work introduces Proportion Loss from learning from label proportions (LLP) into SSL as a regularization term, mitigating bias across both majority and minority classes, and forms a stochastic variant that accounts for fluctuations in mini-batch composition.

Abstract

Semi-supervised learning (SSL) often suffers under class imbalance, where pseudo-labeling amplifies majority bias and suppresses minority performance. We address this issue with a lightweight framework that, to our knowledge, is the first to introduce Proportion Loss from learning from label proportions (LLP) into SSL as a regularization term. Proportion Loss aligns model predictions with the global class distribution, mitigating bias across both majority and minority classes. To further stabilize training, we formulate a stochastic variant that accounts for fluctuations in mini-batch composition. Experiments on the Long-tailed CIFAR-10 benchmark show that integrating Proportion Loss into FixMatch and ReMixMatch consistently improves performance over the baselines across imbalance severities and label ratios, and achieves competitive or superior results compared to existing CISSL methods, particularly under scarce-label conditions.

Leveraging Label Proportion Prior for Class-Imbalanced Semi-Supervised Learning

TL;DR

Abstract

Paper Structure (13 sections, 4 equations, 4 figures, 1 table)

This paper contains 13 sections, 4 equations, 4 figures, 1 table.

Introduction
Related work
Proportion-Regularized Semi-Supervised Learning
Problem Setting
Proportion Regularization for SSL
Regularization with Proportion Loss
Proportion Perturbation via Hypergeometric Sampling
Experiment
Experimental Setup
Comparison of Accuracy
Estimated Proportion after Training
Analysis of Pseudo-Label Selection
Conclusion

Figures (4)

Figure 1: Our approach: Regularization with the label proportion prior for semi-supervised learning (SSL).
Figure 2: Overview of the proposed method.
Figure 3: Comparison of estimated output proportions after training. Red indicates overestimation and blue indicates underestimation across classes (Class 1 = major, Classes 7–9 = minor).
Figure 4: Recall of pseudo-labels for the most major class (left) and most minor class (right) during training.

Leveraging Label Proportion Prior for Class-Imbalanced Semi-Supervised Learning

TL;DR

Abstract

Leveraging Label Proportion Prior for Class-Imbalanced Semi-Supervised Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)