Table of Contents
Fetching ...

Deconstructing the Failure of Ideal Noise Correction: A Three-Pillar Diagnosis

Chen Feng, Zhuo Zhi, Zhao Huang, Jiawei Ge, Ling Xiao, Nicu Sebe, Georgios Tzimiropoulos, Ioannis Patras

Abstract

Statistically consistent methods based on the noise transition matrix ($T$) offer a theoretically grounded solution to Learning with Noisy Labels (LNL), with guarantees of convergence to the optimal clean-data classifier. In practice, however, these methods are often outperformed by empirical approaches such as sample selection, and this gap is usually attributed to the difficulty of accurately estimating $T$. The common assumption is that, given a perfect $T$, noise-correction methods would recover their theoretical advantage. In this work, we put this longstanding hypothesis to a decisive test. We conduct experiments under idealized conditions, providing correction methods with a perfect, oracle transition matrix. Even under these ideal conditions, we observe that these methods still suffer from performance collapse during training. This compellingly demonstrates that the failure is not fundamentally a $T$-estimation problem, but stems from a more deeply rooted flaw. To explain this behaviour, we provide a unified analysis that links three levels: macroscopic convergence states, microscopic optimisation dynamics, and information-theoretic limits on what can be learned from noisy labels. Together, these results give a formal account of why ideal noise correction fails and offer concrete guidance for designing more reliable methods for learning with noisy labels.

Deconstructing the Failure of Ideal Noise Correction: A Three-Pillar Diagnosis

Abstract

Statistically consistent methods based on the noise transition matrix () offer a theoretically grounded solution to Learning with Noisy Labels (LNL), with guarantees of convergence to the optimal clean-data classifier. In practice, however, these methods are often outperformed by empirical approaches such as sample selection, and this gap is usually attributed to the difficulty of accurately estimating . The common assumption is that, given a perfect , noise-correction methods would recover their theoretical advantage. In this work, we put this longstanding hypothesis to a decisive test. We conduct experiments under idealized conditions, providing correction methods with a perfect, oracle transition matrix. Even under these ideal conditions, we observe that these methods still suffer from performance collapse during training. This compellingly demonstrates that the failure is not fundamentally a -estimation problem, but stems from a more deeply rooted flaw. To explain this behaviour, we provide a unified analysis that links three levels: macroscopic convergence states, microscopic optimisation dynamics, and information-theoretic limits on what can be learned from noisy labels. Together, these results give a formal account of why ideal noise correction fails and offer concrete guidance for designing more reliable methods for learning with noisy labels.
Paper Structure (74 sections, 3 theorems, 58 equations, 5 figures, 2 tables)

This paper contains 74 sections, 3 theorems, 58 equations, 5 figures, 2 tables.

Key Result

Theorem 4.2

Let $f_{\mathtt{NC}}$ and $f_{\mathtt{FC}}$ be the ideal population minimizers of the No Correction and Forward Correction risks, respectively.

Figures (5)

  • Figure 1: Test accuracy on CIFAR-10 with 50% symmetric noise.
  • Figure 2: Test accuracy on CIFAR-100 with 50% symmetric noise.
  • Figure 4: ACC and ECE comparison on CIFAR datasets under Ideal Fitted Case.
  • Figure 5: Comparison of Accuracy and ECE for CIFAR-10 on multi-labeled dataset.
  • Figure 6: Gradient vector field of the FC loss on a 3-class simplex. We denote the clean label vertex as $A$ ($\mathbf{e}_{y^*}$), the noisy label as $B$ ($\mathbf{e}_{y^n}$), and the theoretical FC optimum as $C$ ($\mathbf{e}_{k_{\mathtt{FC}}^*}$). The vector field confirms that the global minimum is at $C$. However, the noisy vertex $B$ acts as a strong, non-optimal attractor. The vanishing gradient magnitude ("dead zone") near $B$ traps SGD, leading to the 'pseudo-convergence' analyzed in \ref{['app:proof_gradient_flow']}.

Theorems & Definitions (7)

  • Definition 4.1: Population-Level Consistency Partition
  • Theorem 4.2: Optimality and Consistency Gap under Ideal Fitting
  • Theorem 4.3: Accuracy Trade-off and Solution Collapse under Memorization
  • Theorem 4.4: Fundamental Information Cost of Label Noise
  • proof
  • proof
  • proof