Table of Contents
Fetching ...

Learning with Instance-Dependent Noisy Labels by Anchor Hallucination and Hard Sample Label Correction

Po-Hsuan Huang, Chia-Ching Lin, Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

TL;DR

This work tackles image classification under instance-dependent label noise (IDN) by differentiating easy versus hard samples in addition to clean versus noisy labels. It introduces anchor hallucination to synthesize hard anchors from easy samples, enabling selection and label correction of hard samples, followed by semi-supervised training that leverages both corrected hard samples and easy samples. An iterative training procedure alternates between classifier optimization and hallucinator refinement, using a Gaussian Mixture Model for easy-sample selection, cosine-based anchor matching, and MixMatch-based SSL. Extensive experiments on synthetic IDN benchmarks and real-world datasets (CIFAR-10N/100N, Clothing1M) show consistent improvements over state-of-the-art NLL methods, highlighting the value of hard samples for shaping robust decision boundaries. The approach offers a new perspective on exploiting hard but clean data to improve robustness to IDN in practical settings and suggests avenues for broader application beyond image classification.

Abstract

Learning from noisy-labeled data is crucial for real-world applications. Traditional Noisy-Label Learning (NLL) methods categorize training data into clean and noisy sets based on the loss distribution of training samples. However, they often neglect that clean samples, especially those with intricate visual patterns, may also yield substantial losses. This oversight is particularly significant in datasets with Instance-Dependent Noise (IDN), where mislabeling probabilities correlate with visual appearance. Our approach explicitly distinguishes between clean vs.noisy and easy vs. hard samples. We identify training samples with small losses, assuming they have simple patterns and correct labels. Utilizing these easy samples, we hallucinate multiple anchors to select hard samples for label correction. Corrected hard samples, along with the easy samples, are used as labeled data in subsequent semi-supervised training. Experiments on synthetic and real-world IDN datasets demonstrate the superior performance of our method over other state-of-the-art NLL methods.

Learning with Instance-Dependent Noisy Labels by Anchor Hallucination and Hard Sample Label Correction

TL;DR

This work tackles image classification under instance-dependent label noise (IDN) by differentiating easy versus hard samples in addition to clean versus noisy labels. It introduces anchor hallucination to synthesize hard anchors from easy samples, enabling selection and label correction of hard samples, followed by semi-supervised training that leverages both corrected hard samples and easy samples. An iterative training procedure alternates between classifier optimization and hallucinator refinement, using a Gaussian Mixture Model for easy-sample selection, cosine-based anchor matching, and MixMatch-based SSL. Extensive experiments on synthetic IDN benchmarks and real-world datasets (CIFAR-10N/100N, Clothing1M) show consistent improvements over state-of-the-art NLL methods, highlighting the value of hard samples for shaping robust decision boundaries. The approach offers a new perspective on exploiting hard but clean data to improve robustness to IDN in practical settings and suggests avenues for broader application beyond image classification.

Abstract

Learning from noisy-labeled data is crucial for real-world applications. Traditional Noisy-Label Learning (NLL) methods categorize training data into clean and noisy sets based on the loss distribution of training samples. However, they often neglect that clean samples, especially those with intricate visual patterns, may also yield substantial losses. This oversight is particularly significant in datasets with Instance-Dependent Noise (IDN), where mislabeling probabilities correlate with visual appearance. Our approach explicitly distinguishes between clean vs.noisy and easy vs. hard samples. We identify training samples with small losses, assuming they have simple patterns and correct labels. Utilizing these easy samples, we hallucinate multiple anchors to select hard samples for label correction. Corrected hard samples, along with the easy samples, are used as labeled data in subsequent semi-supervised training. Experiments on synthetic and real-world IDN datasets demonstrate the superior performance of our method over other state-of-the-art NLL methods.
Paper Structure (14 sections, 3 equations, 3 figures, 5 tables)

This paper contains 14 sections, 3 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: A schematic plot of our method for visual classification in comparison with classic NLL methods. (a) Classification on a noisy dataset with Instance-Dependent Noise (IDN) noisy labels. (b) Existing selection-based NLL methods li20dividemixkarim21unicon treat large-loss samples near the decision boundary that are hard to classify as unlabeled data. (c) Our proposed method identifies the hard samples and corrects their labels through anchor hallucination and selection.
  • Figure 2: Our NLL learning framework consists of two main training phases, namely the classification phase and the hallucinator training phase. The classification phase consists of four steps: (1) easy sample selection, (2) hard anchor hallucination, (3) hard sample selection, and (4) semi-supervised learning. The hallucinator model is updated in the hallucinator training phase.
  • Figure 3: The t-SNE visualization for the hallucinated anchors. We use darker colors to denote the hallucinated anchors and lighter colors for real samples. Note that most of the hallucinated anchors are distributed around the decision boundary.