Table of Contents
Fetching ...

Class-Imbalanced Complementary-Label Learning via Weighted Loss

Meng Wei, Yong Zhou, Zhongnian Li, Xinzheng Xu

TL;DR

This work tackles the problem of learning from class-imbalanced complementary labels in multi-class classification. It proposes Weighted Complementary-Label Learning (WCLL), a weighted empirical-risk framework that adjusts losses by class-prior-derived weights to counteract imbalance, and proves a generalization bound showing convergence to the optimal solution as data grow. The approach yields consistent, significant improvements over state-of-the-art CLL methods on MNIST, CIFAR-10, Tiny-Imagenet, and a real-world DDSM dataset, demonstrating robust performance under both single- and multi-class imbalance. The results suggest that incorporating balanced weighting directly into the complementary-label loss is an effective strategy for handling imbalance in weakly supervised, label-uncertainty settings while maintaining theoretical guarantees.

Abstract

Complementary-label learning (CLL) is widely used in weakly supervised classification, but it faces a significant challenge in real-world datasets when confronted with class-imbalanced training samples. In such scenarios, the number of samples in one class is considerably lower than in other classes, which consequently leads to a decline in the accuracy of predictions. Unfortunately, existing CLL approaches have not investigate this problem. To alleviate this challenge, we propose a novel problem setting that enables learning from class-imbalanced complementary labels for multi-class classification. To tackle this problem, we propose a novel CLL approach called Weighted Complementary-Label Learning (WCLL). The proposed method models a weighted empirical risk minimization loss by utilizing the class-imbalanced complementary labels, which is also applicable to multi-class imbalanced training samples. Furthermore, we derive an estimation error bound to provide theoretical assurance. To evaluate our approach, we conduct extensive experiments on several widely-used benchmark datasets and a real-world dataset, and compare our method with existing state-of-the-art methods. The proposed approach shows significant improvement in these datasets, even in the case of multiple class-imbalanced scenarios. Notably, the proposed method not only utilizes complementary labels to train a classifier but also solves the problem of class imbalance.

Class-Imbalanced Complementary-Label Learning via Weighted Loss

TL;DR

This work tackles the problem of learning from class-imbalanced complementary labels in multi-class classification. It proposes Weighted Complementary-Label Learning (WCLL), a weighted empirical-risk framework that adjusts losses by class-prior-derived weights to counteract imbalance, and proves a generalization bound showing convergence to the optimal solution as data grow. The approach yields consistent, significant improvements over state-of-the-art CLL methods on MNIST, CIFAR-10, Tiny-Imagenet, and a real-world DDSM dataset, demonstrating robust performance under both single- and multi-class imbalance. The results suggest that incorporating balanced weighting directly into the complementary-label loss is an effective strategy for handling imbalance in weakly supervised, label-uncertainty settings while maintaining theoretical guarantees.

Abstract

Complementary-label learning (CLL) is widely used in weakly supervised classification, but it faces a significant challenge in real-world datasets when confronted with class-imbalanced training samples. In such scenarios, the number of samples in one class is considerably lower than in other classes, which consequently leads to a decline in the accuracy of predictions. Unfortunately, existing CLL approaches have not investigate this problem. To alleviate this challenge, we propose a novel problem setting that enables learning from class-imbalanced complementary labels for multi-class classification. To tackle this problem, we propose a novel CLL approach called Weighted Complementary-Label Learning (WCLL). The proposed method models a weighted empirical risk minimization loss by utilizing the class-imbalanced complementary labels, which is also applicable to multi-class imbalanced training samples. Furthermore, we derive an estimation error bound to provide theoretical assurance. To evaluate our approach, we conduct extensive experiments on several widely-used benchmark datasets and a real-world dataset, and compare our method with existing state-of-the-art methods. The proposed approach shows significant improvement in these datasets, even in the case of multiple class-imbalanced scenarios. Notably, the proposed method not only utilizes complementary labels to train a classifier but also solves the problem of class imbalance.
Paper Structure (21 sections, 4 theorems, 24 equations, 3 figures, 7 tables, 1 algorithm)

This paper contains 21 sections, 4 theorems, 24 equations, 3 figures, 7 tables, 1 algorithm.

Key Result

Lemma 1

For any $\delta > 0$, with the probability at least $1- \delta / 2$, we have where $R(f) = \mathbb{E}_{\bar{P}}$, $\hat{R}(f)$ denotes the empirical risk estimator of $R(f)$, and $\mathfrak{R}_{N_l}(\mathcal{F})$ are the Rademacher complexities rademacher of $\mathcal{F}$ for the sampling of size $N_l$ from $\bar{P}(\textbf{x}, \bar{y})$, $I = 2 \omega_KL_f[\pi_K(K-1)+K]$.

Figures (3)

  • Figure 1: The comparative analysis of complementary label and ground-truth label on DDSM datasets. To protect the privacy of patients, collecting complementary labels may be a more viable alternative to obtaining accurate ground-truth labels. In addition, the DDSM dataset exhibits an imbalanced distribution with normal images constituting $61\%$, benign images $19\%$, and cancer images $20\%$. Here, The term "TL" refers to ground-truth label, whereas "CL" denotes complementary laebl.
  • Figure 2: The training process of the proposed method. The proposed method involves computing weights for the class-imbalanced dataset and incorporating them into the complementary labeled losses to achieve balanced class losses. During training, our method only use complementary label to train the neural network.
  • Figure 3: Experiments results of test classification accuracy on the MNIST and CIFAR-10 datasets for 5 trials. The dark colors show the mean accuracy of the imbalanced class and the light colors show the standard deviation of the imbalanced class. Here, p denotes the denotes the number of radio between the class from $\mathcal{T}_{maj}$ and $\mathcal{T}_{min}$. Im-class refers to the imbalance class. The proposed method has the highest accuracy on imbalanced class.

Theorems & Definitions (6)

  • Lemma 1
  • Theorem 1
  • Lemma 2
  • proof
  • Theorem 2
  • proof