Table of Contents
Fetching ...

Addressing Long-Tail Noisy Label Learning Problems: a Two-Stage Solution with Label Refurbishment Considering Label Rarity

Ying-Hsuan Wu, Jun-Wei Hsieh, Li Xin, Shin-You Teng, Yi-Kuan Hsieh, Ming-Ching Chang

TL;DR

The paper tackles the problem of learning under simultaneous long-tail distributions and noisy labels by proposing LR^2, a two-stage framework that first obtains unbiased representations via contrastive learning and a robust classifier trained with the BalancEd Noise-tolerant Cross Entropy (BANC) loss, then refines labels through soft-label refurbishment and trains three specialized experts to handle many-shot, medium-shot, and few-shot classes. The combination of soft-refurbished labels and a three-expert ensemble improves robustness to noise and class imbalance, with backbone features kept unbiased by the initial contrastive stage. Empirical results on simulated and real-world long-tail noisy datasets demonstrate state-of-the-art performance, including 94.19% on CIFAR-10 and 77.05% on CIFAR-100 in synthetic settings and 77.74% and 81.40% on Food-101N and Animal-10N, respectively, highlighting practical impact for real-world noisy data. The approach advances robust classification under dual imperfections and offers a scalable, generalizable strategy for long-tail noisy-label learning.

Abstract

Real-world datasets commonly exhibit noisy labels and class imbalance, such as long-tailed distributions. While previous research addresses this issue by differentiating noisy and clean samples, reliance on information from predictions based on noisy long-tailed data introduces potential errors. To overcome the limitations of prior works, we introduce an effective two-stage approach by combining soft-label refurbishing with multi-expert ensemble learning. In the first stage of robust soft label refurbishing, we acquire unbiased features through contrastive learning, making preliminary predictions using a classifier trained with a carefully designed BAlanced Noise-tolerant Cross-entropy (BANC) loss. In the second stage, our label refurbishment method is applied to obtain soft labels for multi-expert ensemble learning, providing a principled solution to the long-tail noisy label problem. Experiments conducted across multiple benchmarks validate the superiority of our approach, Label Refurbishment considering Label Rarity (LR^2), achieving remarkable accuracies of 94.19% and 77.05% on simulated noisy CIFAR-10 and CIFAR-100 long-tail datasets, as well as 77.74% and 81.40% on real-noise long-tail datasets, Food-101N and Animal-10N, surpassing existing state-of-the-art methods.

Addressing Long-Tail Noisy Label Learning Problems: a Two-Stage Solution with Label Refurbishment Considering Label Rarity

TL;DR

The paper tackles the problem of learning under simultaneous long-tail distributions and noisy labels by proposing LR^2, a two-stage framework that first obtains unbiased representations via contrastive learning and a robust classifier trained with the BalancEd Noise-tolerant Cross Entropy (BANC) loss, then refines labels through soft-label refurbishment and trains three specialized experts to handle many-shot, medium-shot, and few-shot classes. The combination of soft-refurbished labels and a three-expert ensemble improves robustness to noise and class imbalance, with backbone features kept unbiased by the initial contrastive stage. Empirical results on simulated and real-world long-tail noisy datasets demonstrate state-of-the-art performance, including 94.19% on CIFAR-10 and 77.05% on CIFAR-100 in synthetic settings and 77.74% and 81.40% on Food-101N and Animal-10N, respectively, highlighting practical impact for real-world noisy data. The approach advances robust classification under dual imperfections and offers a scalable, generalizable strategy for long-tail noisy-label learning.

Abstract

Real-world datasets commonly exhibit noisy labels and class imbalance, such as long-tailed distributions. While previous research addresses this issue by differentiating noisy and clean samples, reliance on information from predictions based on noisy long-tailed data introduces potential errors. To overcome the limitations of prior works, we introduce an effective two-stage approach by combining soft-label refurbishing with multi-expert ensemble learning. In the first stage of robust soft label refurbishing, we acquire unbiased features through contrastive learning, making preliminary predictions using a classifier trained with a carefully designed BAlanced Noise-tolerant Cross-entropy (BANC) loss. In the second stage, our label refurbishment method is applied to obtain soft labels for multi-expert ensemble learning, providing a principled solution to the long-tail noisy label problem. Experiments conducted across multiple benchmarks validate the superiority of our approach, Label Refurbishment considering Label Rarity (LR^2), achieving remarkable accuracies of 94.19% and 77.05% on simulated noisy CIFAR-10 and CIFAR-100 long-tail datasets, as well as 77.74% and 81.40% on real-noise long-tail datasets, Food-101N and Animal-10N, surpassing existing state-of-the-art methods.
Paper Structure (20 sections, 12 equations, 6 figures, 8 tables)

This paper contains 20 sections, 12 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: A fundamental challenge in long-tailed noisy label classification is the potential confusion between noisy labels in head classes and clean labels in tail classes. Data rarity in tail classes further complicates the label refurbishment in these instances.
  • Figure 2: The proposed approach tackles the long-tail noisy label learning problem through a two-stage process. Stage 1 involves initial prediction using contrastive learning using a newly designed BAlanced Noise-tolerant Cross entropy (BANC) loss ($\S$\ref{['section:contrastive']}), followed by label refurbishment ($\S$\ref{['section:re-label']}). Stage 2 employs ensemble learning using three expert modules ($\S$\ref{['section:ensemble']}), specifically designed to enhance long-tail classification.
  • Figure 3: Label refurbishment involves the use of soft labels determined by the confidence of the original labels. In this example, the prediction score of C3 is updated after normalization.
  • Figure A1: Impact of the scaling coefficient $c$ on classification accuracy is examined, with the optimal result observed when $c=6$. Note that this result solely represents predictions from the first stage and does not incorporate label refurbishment and multi-expert ensemble learning.
  • Figure A2: The effect of hyperparameter $\alpha$ on classification accuracy is investigated, and the optimal result is observed when $\alpha = 0.2$. Note that this outcome solely reflects predictions from the first stage and does not encompass label refurbishment and multi-expert ensemble learning.
  • ...and 1 more figures