Table of Contents
Fetching ...

Cost-Sensitive Unbiased Risk Estimation for Multi-Class Positive-Unlabeled Learning

Miao Zhang, Junpeng Li, Changchun Hua, Yana Yang

TL;DR

The paper tackles multi-class positive-unlabeled (MPU) learning, where negatives are unavailable, by introducing CSMPU, a cost-sensitive framework that yields an unbiased estimate of the target risk for observed classes. It combines a per-class cost-sensitive one-vs-rest objective with a non-negativity correction to stabilize training on unlabeled data, and provides theoretical guarantees including a Rademacher-based generalization bound and bias analysis under class-prior misspecification. A principled class-prior estimation procedure (NP-lower bounds plus penalized L1 moment matching) supports accurate prior handling. Empirically, CSMPU achieves consistent improvements in accuracy and stability across eight diverse datasets and several negative-prior settings, validating its practicality for robust observed-class detection in MPU.

Abstract

Positive--Unlabeled (PU) learning considers settings in which only positive and unlabeled data are available, while negatives are missing or left unlabeled. This situation is common in real applications where annotating reliable negatives is difficult or costly. Despite substantial progress in PU learning, the multi-class case (MPU) remains challenging: many existing approaches do not ensure \emph{unbiased risk estimation}, which limits performance and stability. We propose a cost-sensitive multi-class PU method based on \emph{adaptive loss weighting}. Within the empirical risk minimization framework, we assign distinct, data-dependent weights to the positive and \emph{inferred-negative} (from the unlabeled mixture) loss components so that the resulting empirical objective is an unbiased estimator of the target risk. We formalize the MPU data-generating process and establish a generalization error bound for the proposed estimator. Extensive experiments on \textbf{eight} public datasets, spanning varying class priors and numbers of classes, show consistent gains over strong baselines in both accuracy and stability.

Cost-Sensitive Unbiased Risk Estimation for Multi-Class Positive-Unlabeled Learning

TL;DR

The paper tackles multi-class positive-unlabeled (MPU) learning, where negatives are unavailable, by introducing CSMPU, a cost-sensitive framework that yields an unbiased estimate of the target risk for observed classes. It combines a per-class cost-sensitive one-vs-rest objective with a non-negativity correction to stabilize training on unlabeled data, and provides theoretical guarantees including a Rademacher-based generalization bound and bias analysis under class-prior misspecification. A principled class-prior estimation procedure (NP-lower bounds plus penalized L1 moment matching) supports accurate prior handling. Empirically, CSMPU achieves consistent improvements in accuracy and stability across eight diverse datasets and several negative-prior settings, validating its practicality for robust observed-class detection in MPU.

Abstract

Positive--Unlabeled (PU) learning considers settings in which only positive and unlabeled data are available, while negatives are missing or left unlabeled. This situation is common in real applications where annotating reliable negatives is difficult or costly. Despite substantial progress in PU learning, the multi-class case (MPU) remains challenging: many existing approaches do not ensure \emph{unbiased risk estimation}, which limits performance and stability. We propose a cost-sensitive multi-class PU method based on \emph{adaptive loss weighting}. Within the empirical risk minimization framework, we assign distinct, data-dependent weights to the positive and \emph{inferred-negative} (from the unlabeled mixture) loss components so that the resulting empirical objective is an unbiased estimator of the target risk. We formalize the MPU data-generating process and establish a generalization error bound for the proposed estimator. Extensive experiments on \textbf{eight} public datasets, spanning varying class priors and numbers of classes, show consistent gains over strong baselines in both accuracy and stability.

Paper Structure

This paper contains 21 sections, 7 theorems, 54 equations, 4 figures, 5 tables.

Key Result

Theorem 1

Assume the data distribution consists of $k-1$ labeled (observed) classes with priors $\pi_i=p(y=i)$ for $i=1,\dots,k-1$, and an unlabeled mixture with density $p_u(\boldsymbol{x})=\sum_{j=1}^k \pi_j p_j(\boldsymbol{x})$. Then the CSMPU population risk is and under our normalization $C{=}1$ this reduces to the constant $-2(1-\pi_k)$.

Figures (4)

  • Figure 1: Training and test curves of CSMPU on three datasets (MNIST, FashionMNIST, and KMNIST) with negative-class prior $\pi_k=0.2$. For each dataset, we plot training loss, test loss, training accuracy, and test accuracy versus epochs; curves are averaged over five independent runs. CSMPU exhibits smooth convergence without pronounced overfitting, contrasting with representative MPU baselines.
  • Figure 2: the performance of a classification model across three different class configurations (4, 6, and 8 classes), each evaluated under varying class priors (0.2, 0.5, and 0.8).
  • Figure 3: Effect of class-prior misspecification on performance and bounds (FashionMNIST, $\pi=0.5$). (a) Macro-F1 versus misspecification magnitude $\lVert\Delta\rVert_{1}$ under two perturbation schemes (scalar-last and adversarial). We show $N{=}6$ (top) and $N{=}4$ (bottom). (b) Empirical (solid) and theoretical (dashed) excess-risk bounds as functions of $\lVert\Delta\rVert_{1}$ for $N\in\{4,6,8\}$. (c) Robust bandwidth: maximum admissible $\lVert\Delta\rVert_{1}$ under allowed performance drops of {1,2,5} percentage points. Here $\Delta=\hat{\pi}-\pi$ with $\sum_i \Delta_i=0$, and the empirical bound is $\sum_i \lambda_i |b_i|\,|\Delta_i|$ with $b_i=\mathbb{E}_{P_i}[\ell^{+}(g_i(X))-\ell^{-}(g_i(X))]$ and $\lambda_i=1/N$; the theory bound is $2C_{\Delta}\sum_i \lambda_i |\Delta_i|$. All points show mean $\pm$ SD over three independent runs.
  • Figure 4: Diagnostics on MNIST under the MPU setting with the $k$-th class prior set to 0.2. Left: average margin for each (true, predicted) pair, where positive values indicate that the true class outscores its strongest rival on average; blank cells denote no support. Right: support counts for the same pairs, with diagonals indicating correct predictions and off–diagonals showing confusions. The two views together reveal where the model is confident versus fragile and how many samples underpin each estimate.

Theorems & Definitions (9)

  • Theorem 1
  • Definition 1: Expected Rademacher complexity
  • Lemma 1
  • Lemma 2
  • Theorem 2
  • Lemma 3: Uniform bound on misspecification bias
  • Theorem 3: Excess risk with class-prior misspecification
  • Remark 1: Single-number summary
  • Theorem 4: corrected risk with convex $1$-Lipschitz $g$