Contrastive Credibility Propagation for Reliable Semi-Supervised Learning
Brody Kutt, Pralay Ramteke, Xavier Mignot, Pamela Toman, Nandini Ramanan, Sujit Rokka Chhetri, Shan Huang, Min Du, William Hewlett
TL;DR
This work tackles the reliability gap in semi-supervised learning by introducing Contrastive Credibility Propagation (CCP), a framework that iteratively refines transductive pseudo-labels using credibility vectors to handle label noise and data variability. CCP unifies pseudo-labeling with a generalized, softly supervised contrastive loss and an outer SEAL-inspired loop to mitigate instance-dependent errors, enabling robust performance across few-label, open-set, noisy-label, and class-misalignment scenarios. The approach is complemented by a KL-divergence based subsampling strategy and a structured training pipeline to build a final classifier after iterative refinement. Empirical results on CIFAR-10/100 and text datasets show CCP consistently outperforms or matches a supervised baseline across five realistic data-variable settings, highlighting its practical value for practitioners facing uncertain data quality.
Abstract
Producing labels for unlabeled data is error-prone, making semi-supervised learning (SSL) troublesome. Often, little is known about when and why an algorithm fails to outperform a supervised baseline. Using benchmark datasets, we craft five common real-world SSL data scenarios: few-label, open-set, noisy-label, and class distribution imbalance/misalignment in the labeled and unlabeled sets. We propose a novel algorithm called Contrastive Credibility Propagation (CCP) for deep SSL via iterative transductive pseudo-label refinement. CCP unifies semi-supervised learning and noisy label learning for the goal of reliably outperforming a supervised baseline in any data scenario. Compared to prior methods which focus on a subset of scenarios, CCP uniquely outperforms the supervised baseline in all scenarios, supporting practitioners when the qualities of labeled or unlabeled data are unknown.
