Table of Contents
Fetching ...

ESA: Example Sieve Approach for Multi-Positive and Unlabeled Learning

Zhongnian Li, Meng Wei, Peng Ying, Xinzheng Xu

TL;DR

The paper tackles learning from multi-positive and unlabeled data where flexible models can cause a shifting of the minimum risk. It introduces the Example Sieve Approach (ESA), which sieves training examples by their Certain Loss ($CL$) to form sieved datasets and a biased, but consistent, SEA risk estimator for multi-class classification. The authors establish consistency and a generalization bound via Rademacher complexity, showing the estimation error decays optimally as data grow. Empirically, ESA outperforms state-of-the-art MPU/PU methods on MNIST, Kuzushi-MNIST, and CIFAR datasets and exhibits robustness to class-prior mis-specification and distribution shifts, highlighting practical impact for weakly supervised multi-class learning.

Abstract

Learning from Multi-Positive and Unlabeled (MPU) data has gradually attracted significant attention from practical applications. Unfortunately, the risk of MPU also suffer from the shift of minimum risk, particularly when the models are very flexible as shown in Fig.\ref{moti}. In this paper, to alleviate the shifting of minimum risk problem, we propose an Example Sieve Approach (ESA) to select examples for training a multi-class classifier. Specifically, we sieve out some examples by utilizing the Certain Loss (CL) value of each example in the training stage and analyze the consistency of the proposed risk estimator. Besides, we show that the estimation error of proposed ESA obtains the optimal parametric convergence rate. Extensive experiments on various real-world datasets show the proposed approach outperforms previous methods.

ESA: Example Sieve Approach for Multi-Positive and Unlabeled Learning

TL;DR

The paper tackles learning from multi-positive and unlabeled data where flexible models can cause a shifting of the minimum risk. It introduces the Example Sieve Approach (ESA), which sieves training examples by their Certain Loss () to form sieved datasets and a biased, but consistent, SEA risk estimator for multi-class classification. The authors establish consistency and a generalization bound via Rademacher complexity, showing the estimation error decays optimally as data grow. Empirically, ESA outperforms state-of-the-art MPU/PU methods on MNIST, Kuzushi-MNIST, and CIFAR datasets and exhibits robustness to class-prior mis-specification and distribution shifts, highlighting practical impact for weakly supervised multi-class learning.

Abstract

Learning from Multi-Positive and Unlabeled (MPU) data has gradually attracted significant attention from practical applications. Unfortunately, the risk of MPU also suffer from the shift of minimum risk, particularly when the models are very flexible as shown in Fig.\ref{moti}. In this paper, to alleviate the shifting of minimum risk problem, we propose an Example Sieve Approach (ESA) to select examples for training a multi-class classifier. Specifically, we sieve out some examples by utilizing the Certain Loss (CL) value of each example in the training stage and analyze the consistency of the proposed risk estimator. Besides, we show that the estimation error of proposed ESA obtains the optimal parametric convergence rate. Extensive experiments on various real-world datasets show the proposed approach outperforms previous methods.

Paper Structure

This paper contains 25 sections, 55 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Illustrating the shift of minimum risk using unlabeled data, particularly when using complex models such as deep networks to train classifiers. Despite the misclassification of some examples, the decision boundary persists in traversing areas of lower loss, resulting in a shift of the minimum risk. Furthermore, while an increase in the number of examples with lower loss results in a reduction of minimum risk, the shifting of minimum risk leads to overfitting in learning from multi-positive and unlabeled data.
  • Figure 2: Illustrations of the generation procedure of sieved dataset $D^s_m$. Each circle denotes an example, and $\sigma_m$ denotes the lower bound. If the value of certain loss is smaller than lower bound, the example is sieved out in the training stage.
  • Figure 3: Illustrations test classification accuracy for all classes on benchmark dataset CIFAR-10 in the training stage. MP $=(1, 2, 3)$ means that classes 1, 2, 3 are taken as multi-positive class. N $=0$ means that class 0 is taken as negative class.
  • Figure 4: Illustrations classification accuracy for all classes and identifying accuracy for negative class with various perturbed mixture proportions. $\theta$ denotes the perturbed rate for mixture proportion. MP $=(1, 2, 3)$ means that classes 1, 2, 3 are taken as multi-positive class. N $=4$ means that class 4 is taken as negative class.
  • Figure 5: Illustrations classification accuracy for all classes and identifying accuracy for negative class with various lower bounds. $\sigma_m$ and $\sigma_u$ denote the CL lower bounds for multi-positive and unlabeled data. In this experiment, the classes 1, 3, 5 are taken as multi-positive class, and the class 7 is taken as negative class on benchmark dataset MNIST.
  • ...and 1 more figures