Table of Contents
Fetching ...

Noisy Label Processing for Classification: A Survey

Mengting Li, Chuang Zhu

TL;DR

This survey synthesizes recent advances in learning with noisy labels for image classification, categorizing noise into instance-independent, instance-dependent, and real-world human-annotated forms, and reviews four methodological families: transition-matrix estimation, regularization, sample selection, and semi-supervised learning. It introduces Real-World Data-Guided Noise (RGN), a framework that harnesses real noisy data to define two pattern indicators—a noise-transition matrix $T_s$ and a feature-concentration metric $Con_k$—to generate near-realistic synthetic noise for benchmarking on CIFAR-10N. The authors construct a CIFAR-10N–guided synthetic benchmark, evaluate representative methods across noise regimes, and find that realistic noise patterns pose greater challenges than traditional synthetic patterns, especially at high noise levels. The work provides a pathway for more faithful benchmarking and indicates a need for robust, scalable strategies that perform well under real-world noise conditions.

Abstract

In recent years, deep neural networks (DNNs) have gained remarkable achievement in computer vision tasks, and the success of DNNs often depends greatly on the richness of data. However, the acquisition process of data and high-quality ground truth requires a lot of manpower and money. In the long, tedious process of data annotation, annotators are prone to make mistakes, resulting in incorrect labels of images, i.e., noisy labels. The emergence of noisy labels is inevitable. Moreover, since research shows that DNNs can easily fit noisy labels, the existence of noisy labels will cause significant damage to the model training process. Therefore, it is crucial to combat noisy labels for computer vision tasks, especially for classification tasks. In this survey, we first comprehensively review the evolution of different deep learning approaches for noisy label combating in the image classification task. In addition, we also review different noise patterns that have been proposed to design robust algorithms. Furthermore, we explore the inner pattern of real-world label noise and propose an algorithm to generate a synthetic label noise pattern guided by real-world data. We test the algorithm on the well-known real-world dataset CIFAR-10N to form a new real-world data-guided synthetic benchmark and evaluate some typical noise-robust methods on the benchmark.

Noisy Label Processing for Classification: A Survey

TL;DR

This survey synthesizes recent advances in learning with noisy labels for image classification, categorizing noise into instance-independent, instance-dependent, and real-world human-annotated forms, and reviews four methodological families: transition-matrix estimation, regularization, sample selection, and semi-supervised learning. It introduces Real-World Data-Guided Noise (RGN), a framework that harnesses real noisy data to define two pattern indicators—a noise-transition matrix and a feature-concentration metric —to generate near-realistic synthetic noise for benchmarking on CIFAR-10N. The authors construct a CIFAR-10N–guided synthetic benchmark, evaluate representative methods across noise regimes, and find that realistic noise patterns pose greater challenges than traditional synthetic patterns, especially at high noise levels. The work provides a pathway for more faithful benchmarking and indicates a need for robust, scalable strategies that perform well under real-world noise conditions.

Abstract

In recent years, deep neural networks (DNNs) have gained remarkable achievement in computer vision tasks, and the success of DNNs often depends greatly on the richness of data. However, the acquisition process of data and high-quality ground truth requires a lot of manpower and money. In the long, tedious process of data annotation, annotators are prone to make mistakes, resulting in incorrect labels of images, i.e., noisy labels. The emergence of noisy labels is inevitable. Moreover, since research shows that DNNs can easily fit noisy labels, the existence of noisy labels will cause significant damage to the model training process. Therefore, it is crucial to combat noisy labels for computer vision tasks, especially for classification tasks. In this survey, we first comprehensively review the evolution of different deep learning approaches for noisy label combating in the image classification task. In addition, we also review different noise patterns that have been proposed to design robust algorithms. Furthermore, we explore the inner pattern of real-world label noise and propose an algorithm to generate a synthetic label noise pattern guided by real-world data. We test the algorithm on the well-known real-world dataset CIFAR-10N to form a new real-world data-guided synthetic benchmark and evaluate some typical noise-robust methods on the benchmark.
Paper Structure (28 sections, 16 equations, 11 figures, 2 tables, 1 algorithm)

This paper contains 28 sections, 16 equations, 11 figures, 2 tables, 1 algorithm.

Figures (11)

  • Figure 1: A review of learning with noisy labels for classification. Specifically, we focus on four categories and present some of the typical algorithms.
  • Figure 2: The development of noise transition matrix estimation method.
  • Figure 3: The development of noise-robust regularization method.
  • Figure 4: The development of sample selection method.
  • Figure 5: The overview of the semi-supervised learning (SSL) process under label noise.
  • ...and 6 more figures