CalibrateMix: Guided-Mixup Calibration of Image Semi-Supervised Models
Mehrab Mustafy Rahman, Jayanth Mohan, Tiberiu Sosea, Cornelia Caragea
TL;DR
CalibrateMix tackles the miscalibration problem in image semi-supervised learning by introducing a targeted mixup strategy guided by training dynamics. It uses $AUM$ for labeled data and $APM$ for unlabeled data to separate easy- and hard-to-learn samples, and then performs dissimilarity-based mixups between easy and hard samples to inject calibrated uncertainty. Empirical results across CIFAR and large-scale datasets like ImageNet and WebVision show substantial reductions in expected calibration error (ECE) and often improved accuracy, especially in low-label settings, and ablations validate the importance of warmup and dissimilar-sample pairing. The approach is compatible with FixMatch, FlexMatch, and SoftMatch, offering a practical, scalable method to produce better-calibrated SSL models for real-world deployment.
Abstract
Semi-supervised learning (SSL) has demonstrated high performance in image classification tasks by effectively utilizing both labeled and unlabeled data. However, existing SSL methods often suffer from poor calibration, with models yielding overconfident predictions that misrepresent actual prediction likelihoods. Recently, neural networks trained with {\tt mixup} that linearly interpolates random examples from the training set have shown better calibration in supervised settings. However, calibration of neural models remains under-explored in semi-supervised settings. Although effective in supervised model calibration, random mixup of pseudolabels in SSL presents challenges due to the overconfidence and unreliability of pseudolabels. In this work, we introduce CalibrateMix, a targeted mixup-based approach that aims to improve the calibration of SSL models while maintaining or even improving their classification accuracy. Our method leverages training dynamics of labeled and unlabeled samples to identify ``easy-to-learn'' and ``hard-to-learn'' samples, which in turn are utilized in a targeted mixup of easy and hard samples. Experimental results across several benchmark image datasets show that our method achieves lower expected calibration error (ECE) and superior accuracy compared to existing SSL approaches.
