Deconstructing the Failure of Ideal Noise Correction: A Three-Pillar Diagnosis

Chen Feng; Zhuo Zhi; Zhao Huang; Jiawei Ge; Ling Xiao; Nicu Sebe; Georgios Tzimiropoulos; Ioannis Patras

Deconstructing the Failure of Ideal Noise Correction: A Three-Pillar Diagnosis

Chen Feng, Zhuo Zhi, Zhao Huang, Jiawei Ge, Ling Xiao, Nicu Sebe, Georgios Tzimiropoulos, Ioannis Patras

Abstract

Statistically consistent methods based on the noise transition matrix ($T$) offer a theoretically grounded solution to Learning with Noisy Labels (LNL), with guarantees of convergence to the optimal clean-data classifier. In practice, however, these methods are often outperformed by empirical approaches such as sample selection, and this gap is usually attributed to the difficulty of accurately estimating $T$. The common assumption is that, given a perfect $T$, noise-correction methods would recover their theoretical advantage. In this work, we put this longstanding hypothesis to a decisive test. We conduct experiments under idealized conditions, providing correction methods with a perfect, oracle transition matrix. Even under these ideal conditions, we observe that these methods still suffer from performance collapse during training. This compellingly demonstrates that the failure is not fundamentally a $T$-estimation problem, but stems from a more deeply rooted flaw. To explain this behaviour, we provide a unified analysis that links three levels: macroscopic convergence states, microscopic optimisation dynamics, and information-theoretic limits on what can be learned from noisy labels. Together, these results give a formal account of why ideal noise correction fails and offer concrete guidance for designing more reliable methods for learning with noisy labels.

Deconstructing the Failure of Ideal Noise Correction: A Three-Pillar Diagnosis

Abstract

Statistically consistent methods based on the noise transition matrix (

) offer a theoretically grounded solution to Learning with Noisy Labels (LNL), with guarantees of convergence to the optimal clean-data classifier. In practice, however, these methods are often outperformed by empirical approaches such as sample selection, and this gap is usually attributed to the difficulty of accurately estimating

. The common assumption is that, given a perfect

, noise-correction methods would recover their theoretical advantage. In this work, we put this longstanding hypothesis to a decisive test. We conduct experiments under idealized conditions, providing correction methods with a perfect, oracle transition matrix. Even under these ideal conditions, we observe that these methods still suffer from performance collapse during training. This compellingly demonstrates that the failure is not fundamentally a

-estimation problem, but stems from a more deeply rooted flaw. To explain this behaviour, we provide a unified analysis that links three levels: macroscopic convergence states, microscopic optimisation dynamics, and information-theoretic limits on what can be learned from noisy labels. Together, these results give a formal account of why ideal noise correction fails and offer concrete guidance for designing more reliable methods for learning with noisy labels.

Paper Structure (74 sections, 3 theorems, 58 equations, 5 figures, 2 tables)

This paper contains 74 sections, 3 theorems, 58 equations, 5 figures, 2 tables.

Introduction
Related Work
Noise Transition Matrix (T-Matrix).
Robust Loss Functions.
Empirically-Driven Sample Selection.
Problem Setting and Preliminaries
Problem Formulation
Learning Objectives
Risk Correction
Forward Correction (FC).
Evaluation Metrics
Theoretical Analysis of Noise Correction
Theoretical Framework
Comparison with Prior Art.
Macroscopic Analysis: Ideal vs. Overfitted States
...and 59 more sections

Key Result

Theorem 4.2

Let $f_{\mathtt{NC}}$ and $f_{\mathtt{FC}}$ be the ideal population minimizers of the No Correction and Forward Correction risks, respectively.

Figures (5)

Figure 1: Test accuracy on CIFAR-10 with 50% symmetric noise.
Figure 2: Test accuracy on CIFAR-100 with 50% symmetric noise.
Figure 4: ACC and ECE comparison on CIFAR datasets under Ideal Fitted Case.
Figure 5: Comparison of Accuracy and ECE for CIFAR-10 on multi-labeled dataset.
Figure 6: Gradient vector field of the FC loss on a 3-class simplex. We denote the clean label vertex as $A$ ($\mathbf{e}_{y^*}$), the noisy label as $B$ ($\mathbf{e}_{y^n}$), and the theoretical FC optimum as $C$ ($\mathbf{e}_{k_{\mathtt{FC}}^*}$). The vector field confirms that the global minimum is at $C$. However, the noisy vertex $B$ acts as a strong, non-optimal attractor. The vanishing gradient magnitude ("dead zone") near $B$ traps SGD, leading to the 'pseudo-convergence' analyzed in \ref{['app:proof_gradient_flow']}.

Theorems & Definitions (7)

Definition 4.1: Population-Level Consistency Partition
Theorem 4.2: Optimality and Consistency Gap under Ideal Fitting
Theorem 4.3: Accuracy Trade-off and Solution Collapse under Memorization
Theorem 4.4: Fundamental Information Cost of Label Noise
proof
proof
proof

Deconstructing the Failure of Ideal Noise Correction: A Three-Pillar Diagnosis

Abstract

Deconstructing the Failure of Ideal Noise Correction: A Three-Pillar Diagnosis

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (7)