Learning from Reduced Labels for Long-Tailed Data
Meng Wei, Zhongnian Li, Yong Zhou, Xinzheng Xu
TL;DR
The paper tackles the high labeling cost and tail-class neglect in long-tailed classification by introducing Reduced Label (RL), a two-part weak labeling scheme that reduces the class set from $K$ to a smaller $l$ while fixing tail classes and providing a 'None' option. It then derives an unbiased risk estimator within the Long-Tailed Reduced Labels (LTRL) framework, supported by Theorem 1 and Theorem 2, and reinforces the approach with data augmentation and Mixup strategies. Theoretical guarantees are provided through an estimation-error bound (Theorem 4) showing convergence as data grows, and extensive experiments on benchmarks including CIFAR-10-LT, CIFAR-100-LT, SVHN-LT, STL-10-LT, and ImageNet-200-LT demonstrate that LTRL consistently outperforms state-of-the-art SSL and PLL methods, particularly improving tail-class performance while reducing labeling effort. Overall, the work offers a practical, theoretically sound path to efficient, accurate learning on long-tailed data with preserved tail supervision.
Abstract
Long-tailed data is prevalent in real-world classification tasks and heavily relies on supervised information, which makes the annotation process exceptionally labor-intensive and time-consuming. Unfortunately, despite being a common approach to mitigate labeling costs, existing weakly supervised learning methods struggle to adequately preserve supervised information for tail samples, resulting in a decline in accuracy for the tail classes. To alleviate this problem, we introduce a novel weakly supervised labeling setting called Reduced Label. The proposed labeling setting not only avoids the decline of supervised information for the tail samples, but also decreases the labeling costs associated with long-tailed data. Additionally, we propose an straightforward and highly efficient unbiased framework with strong theoretical guarantees to learn from these Reduced Labels. Extensive experiments conducted on benchmark datasets including ImageNet validate the effectiveness of our approach, surpassing the performance of state-of-the-art weakly supervised methods.
