Uncertainty-aware Long-tailed Weights Model the Utility of Pseudo-labels for Semi-supervised Learning
Jiaqi Wu, Junbiao Pang, Qingming Huang
TL;DR
This work tackles threshold sensitivity and over-confidence in pseudo-label-based SSL by introducing Uncertainty-aware Ensemble Structure (UES), a lightweight, architecture-agnostic framework that models pseudo-label utility with long-tailed weights. UES combines mean-based sample uncertainty and prediction-head uncertainty to produce per-sample and per-head weights, integrated into SSL losses without fixed confidence thresholds. The method uses a mean ensemble reference and softmax-based head weighting to generate robust pseudo-labels, improving performance on semi-supervised pose estimation and CIFAR classification, outperforming baselines like CBE and FixMatch. These results demonstrate that incorporating uncertainty into pseudo-label weighting yields more informative supervisory signals and enhanced robustness across tasks.
Abstract
Current Semi-supervised Learning (SSL) adopts the pseudo-labeling strategy and further filters pseudo-labels based on confidence thresholds. However, this mechanism has notable drawbacks: 1) setting the reasonable threshold is an open problem which significantly influences the selection of the high-quality pseudo-labels; and 2) deep models often exhibit the over-confidence phenomenon which makes the confidence value an unreliable indicator for assessing the quality of pseudo-labels due to the scarcity of labeled data. In this paper, we propose an Uncertainty-aware Ensemble Structure (UES) to assess the utility of pseudo-labels for unlabeled samples. We further model the utility of pseudo-labels as long-tailed weights to avoid the open problem of setting the threshold. Concretely, the advantage of the long-tailed weights ensures that even unreliable pseudo-labels still contribute to enhancing the model's robustness. Besides, UES is lightweight and architecture-agnostic, easily extending to various computer vision tasks, including classification and regression. Experimental results demonstrate that combining the proposed method with DualPose leads to a 3.47% improvement in Percentage of Correct Keypoints (PCK) on the Sniffing dataset with 100 data points (30 labeled), a 7.29\% improvement in PCK on the FLIC dataset with 100 data points (50 labeled), and a 3.91% improvement in PCK on the LSP dataset with 200 data points (100 labeled). Furthermore, when combined with FixMatch, the proposed method achieves a 0.2% accuracy improvement on the CIFAR-10 dataset with 40 labeled data points and a 0.26% accuracy improvement on the CIFAR-100 dataset with 400 labeled data points.
