SemiReward: A General Reward Model for Semi-supervised Learning
Siyuan Li, Weiyang Jin, Zedong Wang, Fang Wu, Zicheng Liu, Cheng Tan, Stan Z. Li
TL;DR
This work tackles the core SSL problem of unreliable pseudo-labels and confirmation bias by introducing SemiReward, a general rewarder that outputs a calibrated score $r\in[0,1]$ to filter pseudo labels. The rewarder is trained online in a two-stage workflow with a lightweight generator to decouple student training from reward estimation, using a cosine-based label similarity $\mathcal{S}(y^u,y^l)$ as the target for $\mathcal{R}$. Empirically, SemiReward yields substantial accuracy gains and faster convergence across 13 SSL benchmarks spanning CV, NLP, and Audio, and it remains compatible with diverse SSL methods like Pseudo Label, FlexMatch, and Free/SoftMatch. This approach offers a practical, modular enhancement for SSL that improves label quality without large overhead, broadening applicability across tasks and modalities.
Abstract
Semi-supervised learning (SSL) has witnessed great progress with various improvements in the self-training framework with pseudo labeling. The main challenge is how to distinguish high-quality pseudo labels against the confirmation bias. However, existing pseudo-label selection strategies are limited to pre-defined schemes or complex hand-crafted policies specially designed for classification, failing to achieve high-quality labels, fast convergence, and task versatility simultaneously. To these ends, we propose a Semi-supervised Reward framework (SemiReward) that predicts reward scores to evaluate and filter out high-quality pseudo labels, which is pluggable to mainstream SSL methods in wide task types and scenarios. To mitigate confirmation bias, SemiReward is trained online in two stages with a generator model and subsampling strategy. With classification and regression tasks on 13 standard SSL benchmarks across three modalities, extensive experiments verify that SemiReward achieves significant performance gains and faster convergence speeds upon Pseudo Label, FlexMatch, and Free/SoftMatch. Code and models are available at https://github.com/Westlake-AI/SemiReward.
