RegMixMatch: Optimizing Mixup Utilization in Semi-Supervised Learning
Haorong Han, Jidong Yuan, Chixuan Wei, Zhongyang Yu
TL;DR
The paper tackles SSL limitations where Mixup can dilute artificial label purity and underutilize low-confidence samples. It introduces RegMixMatch, combining SRM (which retains clean samples and selected mixed samples for high-confidence data) and CAM (which uses top-2 class information to safely incorporate low-confidence samples via targeted mixing), along with ResizeMix for data augmentation. The approach yields strong empirical gains across multiple benchmarks, achieving state-of-the-art results in many settings and demonstrating improved learning efficiency. The work advances practical SSL by enabling robust use of both high- and low-confidence unlabeled data, with implications for more effective training under limited labels.
Abstract
Consistency regularization and pseudo-labeling have significantly advanced semi-supervised learning (SSL). Prior works have effectively employed Mixup for consistency regularization in SSL. However, our findings indicate that applying Mixup for consistency regularization may degrade SSL performance by compromising the purity of artificial labels. Moreover, most pseudo-labeling based methods utilize thresholding strategy to exclude low-confidence data, aiming to mitigate confirmation bias; however, this approach limits the utility of unlabeled samples. To address these challenges, we propose RegMixMatch, a novel framework that optimizes the use of Mixup with both high- and low-confidence samples in SSL. First, we introduce semi-supervised RegMixup, which effectively addresses reduced artificial labels purity by using both mixed samples and clean samples for training. Second, we develop a class-aware Mixup technique that integrates information from the top-2 predicted classes into low-confidence samples and their artificial labels, reducing the confirmation bias associated with these samples and enhancing their effective utilization. Experimental results demonstrate that RegMixMatch achieves state-of-the-art performance across various SSL benchmarks.
