Table of Contents
Fetching ...

A Unified and Stable Risk Minimization Framework for Weakly Supervised Learning with Theoretical Guarantees

Miao Zhang, Junpeng Li, Changchun Hua, Yana Yang

TL;DR

Weakly supervised learning often yields unstable risk estimators across diverse supervision patterns. The authors propose EoERM, an Extension of ERM, which builds a stable surrogate risk that unifies PU, UU, multi-UU, CLL, PLL, and tuple-based supervision under a single objective using a symmetric loss and a linear operator model. They establish non-asymptotic generalization guarantees based on Rademacher complexity, quantify the impact of class-prior misspecification, and provide identifiability conditions for UU. Empirical results across MNIST, Fashion-MNIST, CIFAR-10, SVHN, and KMNIST show consistent gains and robustness without heuristic stabilization, demonstrating practical applicability.

Abstract

Weakly supervised learning has emerged as a practical alternative to fully supervised learning when complete and accurate labels are costly or infeasible to acquire. However, many existing methods are tailored to specific supervision patterns -- such as positive-unlabeled (PU), unlabeled-unlabeled (UU), complementary-label (CLL), partial-label (PLL), or similarity-unlabeled annotations -- and rely on post-hoc corrections to mitigate instability induced by indirect supervision. We propose a principled, unified framework that bypasses such post-hoc adjustments by directly formulating a stable surrogate risk grounded in the structure of weakly supervised data. The formulation naturally subsumes diverse settings -- including PU, UU, CLL, PLL, multi-class unlabeled, and tuple-based learning -- under a single optimization objective. We further establish a non-asymptotic generalization bound via Rademacher complexity that clarifies how supervision structure, model capacity, and sample size jointly govern performance. Beyond this, we analyze the effect of class-prior misspecification on the bound, deriving explicit terms that quantify its impact, and we study identifiability, giving sufficient conditions -- most notably via supervision stratification across groups -- under which the target risk is recoverable. Extensive experiments show consistent gains across class priors, dataset scales, and class counts -- without heuristic stabilization -- while exhibiting robustness to overfitting.

A Unified and Stable Risk Minimization Framework for Weakly Supervised Learning with Theoretical Guarantees

TL;DR

Weakly supervised learning often yields unstable risk estimators across diverse supervision patterns. The authors propose EoERM, an Extension of ERM, which builds a stable surrogate risk that unifies PU, UU, multi-UU, CLL, PLL, and tuple-based supervision under a single objective using a symmetric loss and a linear operator model. They establish non-asymptotic generalization guarantees based on Rademacher complexity, quantify the impact of class-prior misspecification, and provide identifiability conditions for UU. Empirical results across MNIST, Fashion-MNIST, CIFAR-10, SVHN, and KMNIST show consistent gains and robustness without heuristic stabilization, demonstrating practical applicability.

Abstract

Weakly supervised learning has emerged as a practical alternative to fully supervised learning when complete and accurate labels are costly or infeasible to acquire. However, many existing methods are tailored to specific supervision patterns -- such as positive-unlabeled (PU), unlabeled-unlabeled (UU), complementary-label (CLL), partial-label (PLL), or similarity-unlabeled annotations -- and rely on post-hoc corrections to mitigate instability induced by indirect supervision. We propose a principled, unified framework that bypasses such post-hoc adjustments by directly formulating a stable surrogate risk grounded in the structure of weakly supervised data. The formulation naturally subsumes diverse settings -- including PU, UU, CLL, PLL, multi-class unlabeled, and tuple-based learning -- under a single optimization objective. We further establish a non-asymptotic generalization bound via Rademacher complexity that clarifies how supervision structure, model capacity, and sample size jointly govern performance. Beyond this, we analyze the effect of class-prior misspecification on the bound, deriving explicit terms that quantify its impact, and we study identifiability, giving sufficient conditions -- most notably via supervision stratification across groups -- under which the target risk is recoverable. Extensive experiments show consistent gains across class priors, dataset scales, and class counts -- without heuristic stabilization -- while exhibiting robustness to overfitting.

Paper Structure

This paper contains 28 sections, 9 theorems, 46 equations, 6 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Assume $p(x,y)$ is realizable: there exists a measurable $f^*:\mathcal{X}\to\mathcal{Y}$ such that $y=f^*(x)$ holds $p(x,y)$-almost surely. Let $\{f_n\}$ satisfy $f_n(x)\to f^*(x)$ for $p(x)$-almost every $x$. Suppose the loss $\ell:\mathbb{R}\times\mathcal{Y}\to\mathbb{R}_{\ge0}$ is continuous in i Then

Figures (6)

  • Figure 1: Training dynamics of ABS-UU and EoERM on four datasets over 100 epochs. Subplots (a)–(d) correspond to Fashion-MNIST, MNIST, KMNIST, and SVHN, respectively. The left $y$-axis shows training loss (solid), the right $y$-axis shows training accuracy (dashed). Blue: ABS-UU; red: EoERM. Shaded regions indicate standard deviation across runs. EoERM consistently attains higher accuracy and smoother optimization than ABS-UU.
  • Figure 2: Performance comparison of ABS-UU (Baseline) and EoERM under varying class priors on FashionMNIST and MNIST datasets. Each subplot shows the evolution of four classification metrics — accuracy (blue), precision (gray), recall (orange), and F1 score (teal) — as the target class prior varies from 0.1 to 0.5. Subplots (a) and (b) present results on FashionMNIST using ABS-UU and EoERM respectively; subplots (c) and (d) show corresponding results on MNIST. All metrics are reported in percentage (%). EoERM consistently maintains higher performance across all metrics and priors compared to ABS-UU, especially in recall and F1 score, indicating better robustness to label distribution shifts.
  • Figure 3: $\Delta$-scan on MNIST (UU). (a) Accuracy and (b) Macro-F1 as functions of the identifiability gap $\Delta = |\pi_{1}-\pi_{2}|$ (log scale). (c) Calibration and likelihood: Negative Log-Likelihood (top; lower is better) and Expected Calibration Error (bottom; %, lower is better). Curves show the mean across three seeds; thin lines denote individual seeds and shaded regions indicate $\pm$1 s.d. No temperature scaling is applied. As $\Delta$ increases (the task becomes easier), Accuracy/F1 improve, NLL monotonically decreases, and ECE remains small with a mild variation across $\Delta$.
  • Figure 4: Robustness of classification performance to prior estimation errors under the UU setting on MNIST. The x-axis denotes the multiplicative factor applied to the true class prior (i.e., noisy prior = true prior × 1.1, 1.2, or 1.3). The y-axis shows performance metrics in percentage: accuracy (blue), precision (gray), recall (orange), and F1 score (teal). Recall increases with overestimated priors, while precision decreases — a trade-off typical in threshold-sensitive classifiers. Accuracy and F1 remain relatively stable, indicating that the method exhibits moderate robustness to prior misspecification.
  • Figure 5: UU learning on MNIST. Comparison of four losses—hinge, logistic, ramp, and sigmoid: (a) training loss vs. epoch; (b) test accuracy vs. epoch; (c) precision–recall (PR) curves obtained by threshold sweeping. Class priors are set to $(0.9,\,0.1)$.
  • ...and 1 more figures

Theorems & Definitions (14)

  • Theorem 1: Risk convergence under realizability
  • Theorem 2: EoERM for binary weak supervision
  • Theorem 3: EoERM for multi-class weak supervision (OVA)
  • Definition 1
  • Theorem 4: Generalization error bound
  • Theorem 5: Generalization bound under misspecified priors for the stable risk
  • Remark 1
  • Corollary 1: If group weights are also estimated
  • Definition 2: UU setting and contrast
  • Lemma 1: Linear contrast identity
  • ...and 4 more