Addressing Long-Tail Noisy Label Learning Problems: a Two-Stage Solution with Label Refurbishment Considering Label Rarity
Ying-Hsuan Wu, Jun-Wei Hsieh, Li Xin, Shin-You Teng, Yi-Kuan Hsieh, Ming-Ching Chang
TL;DR
The paper tackles the problem of learning under simultaneous long-tail distributions and noisy labels by proposing LR^2, a two-stage framework that first obtains unbiased representations via contrastive learning and a robust classifier trained with the BalancEd Noise-tolerant Cross Entropy (BANC) loss, then refines labels through soft-label refurbishment and trains three specialized experts to handle many-shot, medium-shot, and few-shot classes. The combination of soft-refurbished labels and a three-expert ensemble improves robustness to noise and class imbalance, with backbone features kept unbiased by the initial contrastive stage. Empirical results on simulated and real-world long-tail noisy datasets demonstrate state-of-the-art performance, including 94.19% on CIFAR-10 and 77.05% on CIFAR-100 in synthetic settings and 77.74% and 81.40% on Food-101N and Animal-10N, respectively, highlighting practical impact for real-world noisy data. The approach advances robust classification under dual imperfections and offers a scalable, generalizable strategy for long-tail noisy-label learning.
Abstract
Real-world datasets commonly exhibit noisy labels and class imbalance, such as long-tailed distributions. While previous research addresses this issue by differentiating noisy and clean samples, reliance on information from predictions based on noisy long-tailed data introduces potential errors. To overcome the limitations of prior works, we introduce an effective two-stage approach by combining soft-label refurbishing with multi-expert ensemble learning. In the first stage of robust soft label refurbishing, we acquire unbiased features through contrastive learning, making preliminary predictions using a classifier trained with a carefully designed BAlanced Noise-tolerant Cross-entropy (BANC) loss. In the second stage, our label refurbishment method is applied to obtain soft labels for multi-expert ensemble learning, providing a principled solution to the long-tail noisy label problem. Experiments conducted across multiple benchmarks validate the superiority of our approach, Label Refurbishment considering Label Rarity (LR^2), achieving remarkable accuracies of 94.19% and 77.05% on simulated noisy CIFAR-10 and CIFAR-100 long-tail datasets, as well as 77.74% and 81.40% on real-noise long-tail datasets, Food-101N and Animal-10N, surpassing existing state-of-the-art methods.
