Table of Contents
Fetching ...

Learning from Reduced Labels for Long-Tailed Data

Meng Wei, Zhongnian Li, Yong Zhou, Xinzheng Xu

TL;DR

The paper tackles the high labeling cost and tail-class neglect in long-tailed classification by introducing Reduced Label (RL), a two-part weak labeling scheme that reduces the class set from $K$ to a smaller $l$ while fixing tail classes and providing a 'None' option. It then derives an unbiased risk estimator within the Long-Tailed Reduced Labels (LTRL) framework, supported by Theorem 1 and Theorem 2, and reinforces the approach with data augmentation and Mixup strategies. Theoretical guarantees are provided through an estimation-error bound (Theorem 4) showing convergence as data grows, and extensive experiments on benchmarks including CIFAR-10-LT, CIFAR-100-LT, SVHN-LT, STL-10-LT, and ImageNet-200-LT demonstrate that LTRL consistently outperforms state-of-the-art SSL and PLL methods, particularly improving tail-class performance while reducing labeling effort. Overall, the work offers a practical, theoretically sound path to efficient, accurate learning on long-tailed data with preserved tail supervision.

Abstract

Long-tailed data is prevalent in real-world classification tasks and heavily relies on supervised information, which makes the annotation process exceptionally labor-intensive and time-consuming. Unfortunately, despite being a common approach to mitigate labeling costs, existing weakly supervised learning methods struggle to adequately preserve supervised information for tail samples, resulting in a decline in accuracy for the tail classes. To alleviate this problem, we introduce a novel weakly supervised labeling setting called Reduced Label. The proposed labeling setting not only avoids the decline of supervised information for the tail samples, but also decreases the labeling costs associated with long-tailed data. Additionally, we propose an straightforward and highly efficient unbiased framework with strong theoretical guarantees to learn from these Reduced Labels. Extensive experiments conducted on benchmark datasets including ImageNet validate the effectiveness of our approach, surpassing the performance of state-of-the-art weakly supervised methods.

Learning from Reduced Labels for Long-Tailed Data

TL;DR

The paper tackles the high labeling cost and tail-class neglect in long-tailed classification by introducing Reduced Label (RL), a two-part weak labeling scheme that reduces the class set from to a smaller while fixing tail classes and providing a 'None' option. It then derives an unbiased risk estimator within the Long-Tailed Reduced Labels (LTRL) framework, supported by Theorem 1 and Theorem 2, and reinforces the approach with data augmentation and Mixup strategies. Theoretical guarantees are provided through an estimation-error bound (Theorem 4) showing convergence as data grows, and extensive experiments on benchmarks including CIFAR-10-LT, CIFAR-100-LT, SVHN-LT, STL-10-LT, and ImageNet-200-LT demonstrate that LTRL consistently outperforms state-of-the-art SSL and PLL methods, particularly improving tail-class performance while reducing labeling effort. Overall, the work offers a practical, theoretically sound path to efficient, accurate learning on long-tailed data with preserved tail supervision.

Abstract

Long-tailed data is prevalent in real-world classification tasks and heavily relies on supervised information, which makes the annotation process exceptionally labor-intensive and time-consuming. Unfortunately, despite being a common approach to mitigate labeling costs, existing weakly supervised learning methods struggle to adequately preserve supervised information for tail samples, resulting in a decline in accuracy for the tail classes. To alleviate this problem, we introduce a novel weakly supervised labeling setting called Reduced Label. The proposed labeling setting not only avoids the decline of supervised information for the tail samples, but also decreases the labeling costs associated with long-tailed data. Additionally, we propose an straightforward and highly efficient unbiased framework with strong theoretical guarantees to learn from these Reduced Labels. Extensive experiments conducted on benchmark datasets including ImageNet validate the effectiveness of our approach, surpassing the performance of state-of-the-art weakly supervised methods.
Paper Structure (21 sections, 28 equations, 3 figures, 9 tables)

This paper contains 21 sections, 28 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Comparison between True Label and Reduced Label in CIFAR-100 dataset. Instead of precisely selecting the correct class label from a set of 100 labels, the Reduced Label only requires annotators to determine whether the limited set of candidate labels includes the correct class label or not. Here, the correct class label is boxed in red.
  • Figure 2: An example of correct class label present or absent in reduced labels set.
  • Figure 3: Illustration of the proposed framework.