Table of Contents
Fetching ...

Understanding and Mitigating the Bias in Sample Selection for Learning with Noisy Labels

Qi Wei, Lei Feng, Haobo Wang, Bo An

TL;DR

The paper addresses bias in sample selection for learning with noisy labels by identifying data bias (class-imbalanced selected sets) in addition to the traditional training bias. It introduces ITEM, a noIse-Tolerant Expert Model that uses a mixture-of-experts backbone for robust selection and a Beta-based mixed sampling strategy to balance tail classes, complemented by a stochastic training regime with MixUp. Empirical results across synthetic and real-world noisy datasets show state-of-the-art performance and robustness, with ablations confirming the contributions of MoE, sampling, and augmentation. The work offers a practical, scalable approach to debiased learning in LNL, reducing reliance on large parameter overhead and improving generalization across diverse noise regimes.

Abstract

Learning with noisy labels aims to ensure model generalization given a label-corrupted training set. The sample selection strategy achieves promising performance by selecting a label-reliable subset for model training. In this paper, we empirically reveal that existing sample selection methods suffer from both data and training bias that are represented as imbalanced selected sets and accumulation errors in practice, respectively. However, only the training bias was handled in previous studies. To address this limitation, we propose a noIse-Tolerant Expert Model (ITEM) for debiased learning in sample selection. Specifically, to mitigate the training bias, we design a robust network architecture that integrates with multiple experts. Compared with the prevailing double-branch network, our network exhibits better performance of selection and prediction by ensembling these experts while training with fewer parameters. Meanwhile, to mitigate the data bias, we propose a mixed sampling strategy based on two weight-based data samplers. By training on the mixture of two class-discriminative mini-batches, the model mitigates the effect of the imbalanced training set while avoiding sparse representations that are easily caused by sampling strategies. Extensive experiments and analyses demonstrate the effectiveness of ITEM. Our code is available at this url \href{https://github.com/1998v7/ITEM}{ITEM}.

Understanding and Mitigating the Bias in Sample Selection for Learning with Noisy Labels

TL;DR

The paper addresses bias in sample selection for learning with noisy labels by identifying data bias (class-imbalanced selected sets) in addition to the traditional training bias. It introduces ITEM, a noIse-Tolerant Expert Model that uses a mixture-of-experts backbone for robust selection and a Beta-based mixed sampling strategy to balance tail classes, complemented by a stochastic training regime with MixUp. Empirical results across synthetic and real-world noisy datasets show state-of-the-art performance and robustness, with ablations confirming the contributions of MoE, sampling, and augmentation. The work offers a practical, scalable approach to debiased learning in LNL, reducing reliance on large parameter overhead and improving generalization across diverse noise regimes.

Abstract

Learning with noisy labels aims to ensure model generalization given a label-corrupted training set. The sample selection strategy achieves promising performance by selecting a label-reliable subset for model training. In this paper, we empirically reveal that existing sample selection methods suffer from both data and training bias that are represented as imbalanced selected sets and accumulation errors in practice, respectively. However, only the training bias was handled in previous studies. To address this limitation, we propose a noIse-Tolerant Expert Model (ITEM) for debiased learning in sample selection. Specifically, to mitigate the training bias, we design a robust network architecture that integrates with multiple experts. Compared with the prevailing double-branch network, our network exhibits better performance of selection and prediction by ensembling these experts while training with fewer parameters. Meanwhile, to mitigate the data bias, we propose a mixed sampling strategy based on two weight-based data samplers. By training on the mixture of two class-discriminative mini-batches, the model mitigates the effect of the imbalanced training set while avoiding sparse representations that are easily caused by sampling strategies. Extensive experiments and analyses demonstrate the effectiveness of ITEM. Our code is available at this url \href{https://github.com/1998v7/ITEM}{ITEM}.
Paper Structure (23 sections, 8 equations, 10 figures, 9 tables, 2 algorithms)

This paper contains 23 sections, 8 equations, 10 figures, 9 tables, 2 algorithms.

Figures (10)

  • Figure 1: Existing selection criteria always lead to an imbalanced training set, termed as the data bias. A ResNet-34 is trained on CIFAR-100N. We visualize the class distribution of the selected set, given three typical selection criteria. The quantity of class-level samples in the last epoch is counted, while the index of classes is sorted. More results can be found in Appendix A.
  • Figure 2: We train ResNet-18 on Sym. 60% CIFAR-100 with small-loss han2018co and fluctuation-based noise filtering wei2022self. (a) Results on small-loss. Left: the class distribution of three training stages. Right: dynamic numbers of four representative categories. (b) Results on two selection criteria. Visualization of the class-level selection performance (F-score) $\text{\sffamily F}_k$ in 100- and 200-th epoch.
  • Figure 3: Comparisons of different architectures. (a)Typical classification network, which consists of a feature extractor and a classifier layer. (b)Mixture-of-experts (MoE)masoudnia2014mixture, a set of experts jointly gives the predicted label for the input. (c)Double-branch robust structure, the network is trained on a selected set that is considered clean by another network. (d)Ours, a mixture-of-experts module, is integrated into the classification network, which works for robust selection as well as prediction ensemble.
  • Figure 4: Mapping function $\mathcal{S}^{\beta}(\cdot)$ with different values of $\beta$.
  • Figure 5: Visualization of debias learning in a class-level. We selected a ResNet-18 as the backbone and compared class-level prediction results of various methods on CIFAR-100 with four noise types. "CE (clean)" denotes training the model on the completely clean set (50k samples in total). The index of classes is sorted according to the class-level accuracy.
  • ...and 5 more figures