CroSel: Cross Selection of Confident Pseudo Labels for Partial-Label Learning
Shiyu Tian, Hongxin Wei, Yiqun Wang, Lei Feng
TL;DR
CroSel tackles partial-label learning by leveraging historical predictions through a cross selection strategy between two models and a co-mix consistency regularization that uses weak/strong augmentations and MixUp to provide targets for unselected samples. The method achieves state-of-the-art results on benchmarks like SVHN, CIFAR-10, and CIFAR-100, with high true-label selection ratios (often >90%) and robust ablation-supported improvements. By combining a memory-bank–driven label selection with dynamic supervision and ensembling at test time, CroSel reduces label ambiguity and sustains learning signals throughout training. This approach offers a practical and scalable path for PLL in real-world datasets with noisy candidate labels and limited supervision.
Abstract
Partial-label learning (PLL) is an important weakly supervised learning problem, which allows each training example to have a candidate label set instead of a single ground-truth label. Identification-based methods have been widely explored to tackle label ambiguity issues in PLL, which regard the true label as a latent variable to be identified. However, identifying the true labels accurately and completely remains challenging, causing noise in pseudo labels during model training. In this paper, we propose a new method called CroSel, which leverages historical predictions from the model to identify true labels for most training examples. First, we introduce a cross selection strategy, which enables two deep models to select true labels of partially labeled data for each other. Besides, we propose a novel consistency regularization term called co-mix to avoid sample waste and tiny noise caused by false selection. In this way, CroSel can pick out the true labels of most examples with high precision. Extensive experiments demonstrate the superiority of CroSel, which consistently outperforms previous state-of-the-art methods on benchmark datasets. Additionally, our method achieves over 90\% accuracy and quantity for selecting true labels on CIFAR-type datasets under various settings.
