Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical
Wei Wang, Takashi Ishida, Yu-Jie Zhang, Gang Niu, Masashi Sugiyama
TL;DR
This work tackles learning from complementary labels without relying on the common uniform-distribution or anchor-label assumptions. It introduces SCARCE, a Selected-Completely-at-Random based approach that yields an unbiased risk estimator and a risk-correction mechanism to control overfitting, while revealing a natural connection to negative-unlabeled learning under an OVR framework. Theoretical results establish calibration to the 0-1 loss and an estimation error bound, ensuring consistency as data grow. Empirically, SCARCE outperforms state-of-the-art methods on diverse synthetic and real-world benchmarks, demonstrating practical effectiveness and robustness to non-uniform labeling and mild priors mis-specification. The work also discusses class-prior estimation via Best Bin Estimation, highlighting a feasible workflow for practical deployment.
Abstract
Complementary-label learning is a weakly supervised learning problem in which each training example is associated with one or multiple complementary labels indicating the classes to which it does not belong. Existing consistent approaches have relied on the uniform distribution assumption to model the generation of complementary labels, or on an ordinary-label training set to estimate the transition matrix in non-uniform cases. However, either condition may not be satisfied in real-world scenarios. In this paper, we propose a novel consistent approach that does not rely on these conditions. Inspired by the positive-unlabeled (PU) learning literature, we propose an unbiased risk estimator based on the Selected-Completely-at-Random assumption for complementary-label learning. We then introduce a risk-correction approach to address overfitting problems. Furthermore, we find that complementary-label learning can be expressed as a set of negative-unlabeled binary classification problems when using the one-versus-rest strategy. Extensive experimental results on both synthetic and real-world benchmark datasets validate the superiority of our proposed approach over state-of-the-art methods.
