Learning Label Refinement and Threshold Adjustment for Imbalanced Semi-Supervised Learning

Zeju Li; Ying-Qiu Zheng; Chen Chen; Saad Jbabdi

Learning Label Refinement and Threshold Adjustment for Imbalanced Semi-Supervised Learning

Zeju Li, Ying-Qiu Zheng, Chen Chen, Saad Jbabdi

TL;DR

This work confronts the challenge of imbalanced semi-supervised learning where pseudo-label quality deteriorates for minority classes. It introduces SEVAL, a validation-data–driven framework that simultaneously learns pseudo-label refinement offsets $oldsymbol{eta}$ and class-wise thresholds $oldsymbol{ au}$, using a held-out labeled set to guide a class-balanced curriculum. Theoretical analysis links Bayes-optimality under distribution shifts to offset-based refinement and emphasizes precision-driven thresholding over recall alone. Empirically, SEVAL achieves state-of-the-art results across multiple long-tailed SSL benchmarks, demonstrates robustness to varying imbalanced ratios and limited labeled data, and remains compatible with diverse SSL frameworks. SEVAL’s validation-driven, data-efficient approach offers a practical, plug-in improvement for imbalanced SSL with broad applicability and potential for further calibration-based refinements.

Abstract

Semi-supervised learning (SSL) algorithms struggle to perform well when exposed to imbalanced training data. In this scenario, the generated pseudo-labels can exhibit a bias towards the majority class, and models that employ these pseudo-labels can further amplify this bias. Here we investigate pseudo-labeling strategies for imbalanced SSL including pseudo-label refinement and threshold adjustment, through the lens of statistical analysis. We find that existing SSL algorithms which generate pseudo-labels using heuristic strategies or uncalibrated model confidence are unreliable when imbalanced class distributions bias pseudo-labels. To address this, we introduce SEmi-supervised learning with pseudo-label optimization based on VALidation data (SEVAL) to enhance the quality of pseudo-labelling for imbalanced SSL. We propose to learn refinement and thresholding parameters from a partition of the training dataset in a class-balanced way. SEVAL adapts to specific tasks with improved pseudo-labels accuracy and ensures pseudo-labels correctness on a per-class basis. Our experiments show that SEVAL surpasses state-of-the-art SSL methods, delivering more accurate and effective pseudo-labels in various imbalanced SSL situations. SEVAL, with its simplicity and flexibility, can enhance various SSL techniques effectively. The code is publicly available (https://github.com/ZerojumpLine/SEVAL).

Learning Label Refinement and Threshold Adjustment for Imbalanced Semi-Supervised Learning

TL;DR

and class-wise thresholds

, using a held-out labeled set to guide a class-balanced curriculum. Theoretical analysis links Bayes-optimality under distribution shifts to offset-based refinement and emphasizes precision-driven thresholding over recall alone. Empirically, SEVAL achieves state-of-the-art results across multiple long-tailed SSL benchmarks, demonstrates robustness to varying imbalanced ratios and limited labeled data, and remains compatible with diverse SSL frameworks. SEVAL’s validation-driven, data-efficient approach offers a practical, plug-in improvement for imbalanced SSL with broad applicability and potential for further calibration-based refinements.

Abstract

Paper Structure (46 sections, 5 theorems, 24 equations, 6 figures, 11 tables)

This paper contains 46 sections, 5 theorems, 24 equations, 6 figures, 11 tables.

Introduction
Related Work
Semi-Supervised learning
Imbalanced Semi-Supervised Learning
Long-tailed learning-based methods
Pseudo-label refinement-based methods
Threshold adjustment-based methods
Limitations of Current Methods
Preliminaries
Pseudo-Label Refinement
Threshold Adjustment
SEVAL
Learning Pseudo-Label Refinement
Learning Threshold Adjustment
Curriculum Learning
...and 31 more sections

Key Result

Proposition 1

Given that a classifier $f^{*}(X)$ is optimized on $P_\mathcal{X}(X,Y)$, is the optimal Bayes classifier on $P_\mathcal{T}(X,Y)$, where $P_\mathcal{X}(X|Y) = P_\mathcal{T}(X|Y)$ and $P_\mathcal{X}(Y) \neq P_\mathcal{T}(Y)$.

Figures (6)

Figure 1: Two-moons toy experiments illustrating the relationship between threshold choice and model performance for class . Accuracy appears in the bottom right. Current maximum class probability-based dynamic thresholding methods such as FlexMatch zhang2021flexmatch, emphasizing Recall, may not be reliable for Case 3 and Case 4. In comparison, SEVAL derived thresholds, reflecting Precision, fit all cases well.
Figure 2: Overview of SEVAL optimization process which consists of two learning strategies aiming at mitigating bias in pseudo-labels within imbalanced SSL scenarios: 1) Pseudo-label refinement and 2) Threshold Adjustment. The curriculum for parameter learning is determined through the evaluation of holdout data performance, ensuring greater accuracy while preventing overfitting.
Figure 3: (a) The evolution of Gain across training iterations. SEVAL accumulates a higher accuracy of pseudo-label than its counterparts. (b) The evolution of Correctness across training iterations. SEVAL can build better trade-off between quality and quantity. (c) The evolution of test accuracy across training iterations. SEVAL-PL outperforms other pseudo-label refinement methods.
Figure 4: (a) Test accuracy when SEVAL is adapted to pseudo-label based SSL algorithms other than FixMatch under the setting of CIFAR-10 $n_1=1500$. SEVAL can readily improve the performance of other SSL algorithsm. (b) Test accuracy when SEVAL employs varied types of post-hoc adjustment parameters. The learned post-hoc parameters consistently enhance performance, particularly in CIFAR-10 experiments. (c) Test accuracy when SEVAL is optimized using different validation samples under the setting of CIFAR-10 $n_1=500$. SEVAL requires few validation samples to learn the optimal curriculum of parameters.
Figure 5: The correlation of different metrics between test Precision of FixMatch on CIFAR10-LT $n_{1}=500$. (a) The correlation of SEVAL learned $\tau_c$ and maximum class probability $P'_c$ between test Precision. Each point represents a class c and the size of the points indicate the number of samples in the labelled training dataset $n_c$. Note that maximum class probability $P'_c$ is the basis of current dynamic threshold method to derive thresholds. For example, FlexMatch selects more samples for classes associated with lower $P'_c$. However, as highlighted by red arrows, $P'_c$ does not correlated with Precision thus $P_c$ based on methods will fail Case 3: High Recall$\&$ High Precision and Case 4: Low Recall$\&$ Low Precision in Fig. \ref{['fig:method_prec']}. (b) Due to the lack of calibration in the network output probability, the estimated precision derived from the probability does not align with the actual Precision, thus cannot be a reliable metric to derive thresholds.
...and 1 more figures

Theorems & Definitions (5)

Proposition 1
Corollary 2
Theorem 3
Lemma 4
Lemma 5

Learning Label Refinement and Threshold Adjustment for Imbalanced Semi-Supervised Learning

TL;DR

Abstract

Learning Label Refinement and Threshold Adjustment for Imbalanced Semi-Supervised Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (5)