Table of Contents
Fetching ...

Pi-DUAL: Using Privileged Information to Distinguish Clean from Noisy Labels

Ke Wang, Guillermo Ortiz-Jimenez, Rodolphe Jenatton, Mark Collier, Efi Kokiopoulou, Pascal Frossard

TL;DR

Pi-DUAL tackles label noise by exploiting privileged information to separate clean-label learning from noise fitting. It introduces a logit-level dual-path architecture with a PI-driven gating mechanism that routes samples to a prediction network or a noise network, enabling end-to-end training on large datasets. The method achieves state-of-the-art test accuracy on PI-rich benchmarks such as CIFAR-10H and ImageNet-PI and provides strong post-training noise-detection capabilities, while remaining scalable and easy to implement. Theoretical and empirical analyses show Pi-DUAL’s robustness to label noise and its effectiveness across varying PI quality, underscoring its practical utility for real-world noisy-label settings.

Abstract

Label noise is a pervasive problem in deep learning that often compromises the generalization performance of trained models. Recently, leveraging privileged information (PI) -- information available only during training but not at test time -- has emerged as an effective approach to mitigate this issue. Yet, existing PI-based methods have failed to consistently outperform their no-PI counterparts in terms of preventing overfitting to label noise. To address this deficiency, we introduce Pi-DUAL, an architecture designed to harness PI to distinguish clean from wrong labels. Pi-DUAL decomposes the output logits into a prediction term, based on conventional input features, and a noise-fitting term influenced solely by PI. A gating mechanism steered by PI adaptively shifts focus between these terms, allowing the model to implicitly separate the learning paths of clean and wrong labels. Empirically, Pi-DUAL achieves significant performance improvements on key PI benchmarks (e.g., +6.8% on ImageNet-PI), establishing a new state-of-the-art test set accuracy. Additionally, Pi-DUAL is a potent method for identifying noisy samples post-training, outperforming other strong methods at this task. Overall, Pi-DUAL is a simple, scalable and practical approach for mitigating the effects of label noise in a variety of real-world scenarios with PI.

Pi-DUAL: Using Privileged Information to Distinguish Clean from Noisy Labels

TL;DR

Pi-DUAL tackles label noise by exploiting privileged information to separate clean-label learning from noise fitting. It introduces a logit-level dual-path architecture with a PI-driven gating mechanism that routes samples to a prediction network or a noise network, enabling end-to-end training on large datasets. The method achieves state-of-the-art test accuracy on PI-rich benchmarks such as CIFAR-10H and ImageNet-PI and provides strong post-training noise-detection capabilities, while remaining scalable and easy to implement. Theoretical and empirical analyses show Pi-DUAL’s robustness to label noise and its effectiveness across varying PI quality, underscoring its practical utility for real-world noisy-label settings.

Abstract

Label noise is a pervasive problem in deep learning that often compromises the generalization performance of trained models. Recently, leveraging privileged information (PI) -- information available only during training but not at test time -- has emerged as an effective approach to mitigate this issue. Yet, existing PI-based methods have failed to consistently outperform their no-PI counterparts in terms of preventing overfitting to label noise. To address this deficiency, we introduce Pi-DUAL, an architecture designed to harness PI to distinguish clean from wrong labels. Pi-DUAL decomposes the output logits into a prediction term, based on conventional input features, and a noise-fitting term influenced solely by PI. A gating mechanism steered by PI adaptively shifts focus between these terms, allowing the model to implicitly separate the learning paths of clean and wrong labels. Empirically, Pi-DUAL achieves significant performance improvements on key PI benchmarks (e.g., +6.8% on ImageNet-PI), establishing a new state-of-the-art test set accuracy. Additionally, Pi-DUAL is a potent method for identifying noisy samples post-training, outperforming other strong methods at this task. Overall, Pi-DUAL is a simple, scalable and practical approach for mitigating the effects of label noise in a variety of real-world scenarios with PI.
Paper Structure (43 sections, 2 theorems, 20 equations, 8 figures, 16 tables)

This paper contains 43 sections, 2 theorems, 20 equations, 8 figures, 16 tables.

Key Result

Theorem 1

Consider $n$ samples from the above Gaussian models with targets ${\bm{y}} = {\bm{\gamma}}^\star {\bm{X}} {\bm{w}}^\star + ({\bm{I}} - {\bm{\gamma}}^\star) {\bm{A}} {\bm{v}}^\star + {\mathbf{\varepsilon}}$. The contributions of the standard and PI features are respectively ${\bm{X}} {\bm{w}}^\star \

Figures (8)

  • Figure 1: Illustration of the architecture of Pi-DUAL. (Left) During training, Pi-DUAL fits the noisy target label $\tilde{y}$ combining the output of a prediction network (which takes the regular features ${\bm{x}}$ as input) and a noise network (which takes the PI ${\bm{a}}$ as input). The outputs of these sub-networks are weighted based on the output of a gating network (which also has ${\bm{a}}$ as input) and then passed through a $\operatorname{softmax}$ operator to obtain the predictions. (Right) During inference, when only ${\bm{x}}$ is available, Pi-DUAL does not need access to PI and simply uses the prediction network to predict the clean target $y$.
  • Figure 2: Distribution for the prediction network's confidence on the observed noisy labels for several datasets, separated by correctly and wrongly labeled samples.
  • Figure 3: Training curves of Pi-DUAL and cross-entropy baseline on different datasets. The first two rows show the training dynamics of prediction network and noise network respectively.We plot separately the training accuracy on clean and wrong labels and test accuracy.
  • Figure 4: Distributions of $\gamma_{\bm\psi}({\bm{a}})$ over training samples with correct and wrong labels on several datasets.
  • Figure 5: Examples of ImageNet-PI images that the gating network suggests are mislabeled. The first row shows samples with actually wrongly annotated labels, and the second row shows examples with correct labels but assumed to be wrong by the gating network. Here, "label" denotes the annotation label $\tilde{y}$ and "pred" the prediction by $f_{\bm{\theta}}$.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 1: Informal
  • Proposition 2