Table of Contents
Fetching ...

A Survey of Label-noise Representation Learning: Past, Present and Future

Bo Han, Quanming Yao, Tongliang Liu, Gang Niu, Ivor W. Tsang, James T. Kwok, Masashi Sugiyama

TL;DR

The survey formalizes Label-noise Representation Learning (LNRL) and analyzes why noisy labels degrade deep models from data, objective, and optimization perspectives. It offers a unified taxonomy distinguishing data-driven noise modeling, loss/regularization design, and memorization-based optimization strategies, and it reviews representative methods across the three axes (e.g., noise-transition layers, forward/backward correction, MentorNet/Co-teaching, Mixup, DivideMix). Key contributions include articulating essential components for robust LNRL, synthesizing theoretical and empirical insights, and outlining future directions such as instance-dependent noise and adversarial LNRL. The work emphasizes datasets, theoretical guarantees, and practical guidelines to advance robust learning in real-world, noisy-label settings across vision, language, and beyond.

Abstract

Classical machine learning implicitly assumes that labels of the training data are sampled from a clean distribution, which can be too restrictive for real-world scenarios. However, statistical-learning-based methods may not train deep learning models robustly with these noisy labels. Therefore, it is urgent to design Label-Noise Representation Learning (LNRL) methods for robustly training deep models with noisy labels. To fully understand LNRL, we conduct a survey study. We first clarify a formal definition for LNRL from the perspective of machine learning. Then, via the lens of learning theory and empirical study, we figure out why noisy labels affect deep models' performance. Based on the theoretical guidance, we categorize different LNRL methods into three directions. Under this unified taxonomy, we provide a thorough discussion of the pros and cons of different categories. More importantly, we summarize the essential components of robust LNRL, which can spark new directions. Lastly, we propose possible research directions within LNRL, such as new datasets, instance-dependent LNRL, and adversarial LNRL. We also envision potential directions beyond LNRL, such as learning with feature-noise, preference-noise, domain-noise, similarity-noise, graph-noise and demonstration-noise.

A Survey of Label-noise Representation Learning: Past, Present and Future

TL;DR

The survey formalizes Label-noise Representation Learning (LNRL) and analyzes why noisy labels degrade deep models from data, objective, and optimization perspectives. It offers a unified taxonomy distinguishing data-driven noise modeling, loss/regularization design, and memorization-based optimization strategies, and it reviews representative methods across the three axes (e.g., noise-transition layers, forward/backward correction, MentorNet/Co-teaching, Mixup, DivideMix). Key contributions include articulating essential components for robust LNRL, synthesizing theoretical and empirical insights, and outlining future directions such as instance-dependent noise and adversarial LNRL. The work emphasizes datasets, theoretical guarantees, and practical guidelines to advance robust learning in real-world, noisy-label settings across vision, language, and beyond.

Abstract

Classical machine learning implicitly assumes that labels of the training data are sampled from a clean distribution, which can be too restrictive for real-world scenarios. However, statistical-learning-based methods may not train deep learning models robustly with these noisy labels. Therefore, it is urgent to design Label-Noise Representation Learning (LNRL) methods for robustly training deep models with noisy labels. To fully understand LNRL, we conduct a survey study. We first clarify a formal definition for LNRL from the perspective of machine learning. Then, via the lens of learning theory and empirical study, we figure out why noisy labels affect deep models' performance. Based on the theoretical guidance, we categorize different LNRL methods into three directions. Under this unified taxonomy, we provide a thorough discussion of the pros and cons of different categories. More importantly, we summarize the essential components of robust LNRL, which can spark new directions. Lastly, we propose possible research directions within LNRL, such as new datasets, instance-dependent LNRL, and adversarial LNRL. We also envision potential directions beyond LNRL, such as learning with feature-noise, preference-noise, domain-noise, similarity-noise, graph-noise and demonstration-noise.

Paper Structure

This paper contains 70 sections, 2 theorems, 23 equations, 6 figures, 1 table.

Key Result

Theorem 1

(Backward Correction, Theorem 1 in patrini2017making) Suppose that $T$ is non-singular, where $T_{ij} = p(\bar{y} = j| y = i)$ given that corrupted label $\bar{y} = j$ is flipped from clean label $y = i$. Given loss $\ell$ and network function $f$, Backward Correction is defined as where $\ell_{y|f(x)} = (\ell(f(x),1),\ldots,\ell(f(x),k))$. Then, corrected loss $\ell^{\leftarrow}(f(x),\bar{y})$ i

Figures (6)

  • Figure 1: We empirically demonstrate the generalization difference between original $\ell$ and corrected $\tilde{\ell}$ (cf. Theorem \ref{['fw-theorem']} in Section \ref{['sec:bfcorr']}). We choose MNIST with 35% of uniform noise as noisy data. There is an obvious gap between $\ell$ and $\tilde{\ell}$ on noisy MNIST.
  • Figure 2: A taxonomy of LNRL based on the focus of each method. For each technique branch, we list a few representative works here.
  • Figure 3: Two representatives of transition matrix $T$.
  • Figure 4: A general case of adaptation layer.
  • Figure 5: A simulated experiment based on different noise rates ($0\%$-$80\%$). We chose MNIST with uniform noise as noisy data. The solid lines denote the training accuracy; while the dotted lines mean the validation accuracy.
  • ...and 1 more figures

Theorems & Definitions (9)

  • Definition 1
  • Definition 2
  • Remark
  • Remark
  • Definition 3
  • Theorem 1
  • Remark
  • Theorem 2
  • Remark