Table of Contents
Fetching ...

Weakly Supervised AUC Optimization: A Unified Partial AUC Approach

Zheng Xie, Yu Liu, Hao-Yuan He, Ming Li, Zhi-Hua Zhou

TL;DR

This work tackles AUC optimization under weak supervision by recasting various imperfect supervision scenarios as AUC risk minimization on contaminated data. The authors introduce WSAUC, a unified framework that expresses these risks as linear transformations of the clean PN risk, enabling a single ERM-based training pipeline across noisy labels, PU, MIL, and SSL settings. To improve robustness, they propose rpAUC, a two-way reversed partial AUC objective that aligns training with the hardest-to-learn instances, and establish excess-risk and variance bounds to justify its stability. Empirical results across multiple datasets and weakly supervised settings demonstrate that WSAUC and rpAUC provide strong, robust AUC performance, particularly when labels are scarce or corrupted, highlighting practical impact for real-world, imperfect supervision scenarios.

Abstract

Since acquiring perfect supervision is usually difficult, real-world machine learning tasks often confront inaccurate, incomplete, or inexact supervision, collectively referred to as weak supervision. In this work, we present WSAUC, a unified framework for weakly supervised AUC optimization problems, which covers noisy label learning, positive-unlabeled learning, multi-instance learning, and semi-supervised learning scenarios. Within the WSAUC framework, we first frame the AUC optimization problems in various weakly supervised scenarios as a common formulation of minimizing the AUC risk on contaminated sets, and demonstrate that the empirical risk minimization problems are consistent with the true AUC. Then, we introduce a new type of partial AUC, specifically, the reversed partial AUC (rpAUC), which serves as a robust training objective for AUC maximization in the presence of contaminated labels. WSAUC offers a universal solution for AUC optimization in various weakly supervised scenarios by maximizing the empirical rpAUC. Theoretical and experimental results under multiple settings support the effectiveness of WSAUC on a range of weakly supervised AUC optimization tasks.

Weakly Supervised AUC Optimization: A Unified Partial AUC Approach

TL;DR

This work tackles AUC optimization under weak supervision by recasting various imperfect supervision scenarios as AUC risk minimization on contaminated data. The authors introduce WSAUC, a unified framework that expresses these risks as linear transformations of the clean PN risk, enabling a single ERM-based training pipeline across noisy labels, PU, MIL, and SSL settings. To improve robustness, they propose rpAUC, a two-way reversed partial AUC objective that aligns training with the hardest-to-learn instances, and establish excess-risk and variance bounds to justify its stability. Empirical results across multiple datasets and weakly supervised settings demonstrate that WSAUC and rpAUC provide strong, robust AUC performance, particularly when labels are scarce or corrupted, highlighting practical impact for real-world, imperfect supervision scenarios.

Abstract

Since acquiring perfect supervision is usually difficult, real-world machine learning tasks often confront inaccurate, incomplete, or inexact supervision, collectively referred to as weak supervision. In this work, we present WSAUC, a unified framework for weakly supervised AUC optimization problems, which covers noisy label learning, positive-unlabeled learning, multi-instance learning, and semi-supervised learning scenarios. Within the WSAUC framework, we first frame the AUC optimization problems in various weakly supervised scenarios as a common formulation of minimizing the AUC risk on contaminated sets, and demonstrate that the empirical risk minimization problems are consistent with the true AUC. Then, we introduce a new type of partial AUC, specifically, the reversed partial AUC (rpAUC), which serves as a robust training objective for AUC maximization in the presence of contaminated labels. WSAUC offers a universal solution for AUC optimization in various weakly supervised scenarios by maximizing the empirical rpAUC. Theoretical and experimental results under multiple settings support the effectiveness of WSAUC on a range of weakly supervised AUC optimization tasks.
Paper Structure (26 sections, 11 theorems, 63 equations, 3 figures, 5 tables, 1 algorithm)

This paper contains 26 sections, 11 theorems, 63 equations, 3 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

The impure AUC risk $R_{{AB}}$ over two contaminated distributions $p_A$ and $p_B$ can be rewrote in the unified formulation as: where the bias term $b=(1\!-\!a)/2$. Based on this formulation, the true risk $R_{{PN}}$ can be obtained from $R_{{AB}}$ with a linear transformation.

Figures (3)

  • Figure 1: Illustrations of ROC curves (black curves), AUC and Partial AUC variations (blue shades under the ROC curves). OPAUC and TPAUC are alternative performance measures specific to task needs. The proposed rpAUC aims for robust maximization of full AUC under weak supervision.
  • Figure 2: Performance comparison of using AUC and rpAUC as training objective, under different noise ratios. Blue shading indicates the improvement in test AUC achieved by maximizing rpAUC during training.
  • Figure 3: Hyperparameter sensitivity. BL (baseline): rpAUC degenerates to full AUC when both $\alpha$ and $\beta$ are set to zero. TP (true proportion): the filtering proportions are identical to the true proportions, i.e., $\alpha=1-\theta_A$ and $\beta=\theta_B$.

Theorems & Definitions (18)

  • Theorem 1: unified formulation
  • proof
  • Corollary 2: consistency of inaccurate case
  • proof
  • Corollary 3
  • Corollary 4
  • Corollary 5
  • Theorem 6
  • proof
  • Corollary 7
  • ...and 8 more