Table of Contents
Fetching ...

Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms

Wei Wang, Dong-Dong Wu, Ming Li, Jingxiong Zhang, Gang Niu, Masashi Sugiyama

TL;DR

The first PU learning benchmark to systematically compare PU learning algorithms is proposed and the internal label shift problem of unlabeled training data for the one-sample setting is identified and a simple yet effective calibration approach is proposed to ensure fair comparisons within and across families.

Abstract

Positive-unlabeled (PU) learning is a weakly supervised binary classification problem, in which the goal is to learn a binary classifier from only positive and unlabeled data, without access to negative data. In recent years, many PU learning algorithms have been developed to improve model performance. However, experimental settings are highly inconsistent, making it difficult to identify which algorithm performs better. In this paper, we propose the first PU learning benchmark to systematically compare PU learning algorithms. During our implementation, we identify subtle yet critical factors that affect the realistic and fair evaluation of PU learning algorithms. On the one hand, many PU learning algorithms rely on a validation set that includes negative data for model selection. This is unrealistic in traditional PU learning settings, where no negative data are available. To handle this problem, we systematically investigate model selection criteria for PU learning. On the other hand, PU learning involves different problem settings and corresponding solution families, i.e., the one-sample and two-sample settings. However, existing evaluation protocols are heavily biased towards the one-sample setting and neglect the significant difference between them. We identify the internal label shift problem of unlabeled training data for the one-sample setting and propose a simple yet effective calibration approach to ensure fair comparisons within and across families. We hope our framework will provide an accessible, realistic, and fair environment for evaluating PU learning algorithms in the future.

Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms

TL;DR

The first PU learning benchmark to systematically compare PU learning algorithms is proposed and the internal label shift problem of unlabeled training data for the one-sample setting is identified and a simple yet effective calibration approach is proposed to ensure fair comparisons within and across families.

Abstract

Positive-unlabeled (PU) learning is a weakly supervised binary classification problem, in which the goal is to learn a binary classifier from only positive and unlabeled data, without access to negative data. In recent years, many PU learning algorithms have been developed to improve model performance. However, experimental settings are highly inconsistent, making it difficult to identify which algorithm performs better. In this paper, we propose the first PU learning benchmark to systematically compare PU learning algorithms. During our implementation, we identify subtle yet critical factors that affect the realistic and fair evaluation of PU learning algorithms. On the one hand, many PU learning algorithms rely on a validation set that includes negative data for model selection. This is unrealistic in traditional PU learning settings, where no negative data are available. To handle this problem, we systematically investigate model selection criteria for PU learning. On the other hand, PU learning involves different problem settings and corresponding solution families, i.e., the one-sample and two-sample settings. However, existing evaluation protocols are heavily biased towards the one-sample setting and neglect the significant difference between them. We identify the internal label shift problem of unlabeled training data for the one-sample setting and propose a simple yet effective calibration approach to ensure fair comparisons within and across families. We hope our framework will provide an accessible, realistic, and fair environment for evaluating PU learning algorithms in the future.

Paper Structure

This paper contains 30 sections, 5 theorems, 27 equations, 8 figures, 18 tables, 1 algorithm.

Key Result

Proposition 1

For two classifiers $f_1$ and $f_2$ that satisfy $\mathbb{E}\left[\mathrm{PA}(f_1)\right]<\mathbb{E}\left[\mathrm{PA}(f_2)\right]$, we have $\mathrm{ACC}(f_1)<\mathrm{ACC}(f_2)$.

Figures (8)

  • Figure 1: An example of the comparison of the distribution of unlabeled training data in different PU learning settings.
  • Figure 2: Classification accuracies of TS PU learning algorithms in OS and TS settings of a PU version of CIFAR-10 with varying amounts of positive data. Figures (a) to (f) are for Case 1, and Figures (g) to (l) are for Case 2.
  • Figure 3: Classification accuracies of TS PU learning algorithms in OS and TS settings of a PU version of ImageNette with varying amounts of positive data. Figures (a) to (e) are for Case 1, and Figures (f) to (j) are for Case 2.
  • Figure 4: Overall performance w.r.t. accuracy and the F1 score across all datasets. Hyperparameters were tuned using PA, PAUC and OA, respectively; bar colors indicate means.
  • Figure 5: Overall performance w.r.t. the AUC score of different algorithms across all datasets. Hyperparameters were tuned using PA, PAUC and OA, respectively; bar colors indicate means.
  • ...and 3 more figures

Theorems & Definitions (15)

  • Definition 1: Proxy accuracy (PA)
  • Proposition 1
  • Definition 2: Proxy AUC score (PAUC)
  • Proposition 2
  • Definition 3: Oracle accuracy (OA)
  • Definition 4: Internal label shift in OS PU learning
  • Theorem 1
  • Theorem 2
  • proof
  • proof
  • ...and 5 more