Table of Contents
Fetching ...

Convergence Behavior of an Adversarial Weak Supervision Method

Steven An, Sanjoy Dasgupta

TL;DR

This paper investigates convergence properties of the adversarial weak supervision (BF) framework, which constructs a worst-case, log-loss minimax game over a coherence polytope ${P}$ defined by rule-accuracy and class-frequency bounds. The BF learner's prediction $g^{bf}$ is shown to be the maximum-entropy distribution in ${P}$ and lies in an exponential-family class ${\mathcal G}$, ultimately equating to a regularized multiclass logistic regression form. A detailed uncertainty decomposition reveals how BF’s consistency arises from the ability to drive approximation error to zero as the bounds tighten, and it yields a convergence rate $d(\eta, g^{bf}) \le d(\eta, g^{*}) + O(\|\epsilon\|_\infty)$. The paper also compares BF to the Dawid–Skene probabilistic approach, showing DS can be inconsistent in EM-driven settings, while BF can dominate under appropriate bound tightening. Experimental results on ten real datasets corroborate the theory, showing competitive or superior log-loss performance and illustrating the consistency phenomenon via convergence to the true label distribution in synthetic setups. Overall, the work provides a rigorous link between adversarial weak supervision and logistic-regression-like inference, with practical implications for reliable label aggregation when labeling functions are imperfect or abstain.

Abstract

Labeling data via rules-of-thumb and minimal label supervision is central to Weak Supervision, a paradigm subsuming subareas of machine learning such as crowdsourced learning and semi-supervised ensemble learning. By using this labeled data to train modern machine learning methods, the cost of acquiring large amounts of hand labeled data can be ameliorated. Approaches to combining the rules-of-thumb falls into two camps, reflecting different ideologies of statistical estimation. The most common approach, exemplified by the Dawid-Skene model, is based on probabilistic modeling. The other, developed in the work of Balsubramani-Freund and others, is adversarial and game-theoretic. We provide a variety of statistical results for the adversarial approach under log-loss: we characterize the form of the solution, relate it to logistic regression, demonstrate consistency, and give rates of convergence. On the other hand, we find that probabilistic approaches for the same model class can fail to be consistent. Experimental results are provided to corroborate the theoretical results.

Convergence Behavior of an Adversarial Weak Supervision Method

TL;DR

This paper investigates convergence properties of the adversarial weak supervision (BF) framework, which constructs a worst-case, log-loss minimax game over a coherence polytope defined by rule-accuracy and class-frequency bounds. The BF learner's prediction is shown to be the maximum-entropy distribution in and lies in an exponential-family class , ultimately equating to a regularized multiclass logistic regression form. A detailed uncertainty decomposition reveals how BF’s consistency arises from the ability to drive approximation error to zero as the bounds tighten, and it yields a convergence rate . The paper also compares BF to the Dawid–Skene probabilistic approach, showing DS can be inconsistent in EM-driven settings, while BF can dominate under appropriate bound tightening. Experimental results on ten real datasets corroborate the theory, showing competitive or superior log-loss performance and illustrating the consistency phenomenon via convergence to the true label distribution in synthetic setups. Overall, the work provides a rigorous link between adversarial weak supervision and logistic-regression-like inference, with practical implications for reliable label aggregation when labeling functions are imperfect or abstain.

Abstract

Labeling data via rules-of-thumb and minimal label supervision is central to Weak Supervision, a paradigm subsuming subareas of machine learning such as crowdsourced learning and semi-supervised ensemble learning. By using this labeled data to train modern machine learning methods, the cost of acquiring large amounts of hand labeled data can be ameliorated. Approaches to combining the rules-of-thumb falls into two camps, reflecting different ideologies of statistical estimation. The most common approach, exemplified by the Dawid-Skene model, is based on probabilistic modeling. The other, developed in the work of Balsubramani-Freund and others, is adversarial and game-theoretic. We provide a variety of statistical results for the adversarial approach under log-loss: we characterize the form of the solution, relate it to logistic regression, demonstrate consistency, and give rates of convergence. On the other hand, we find that probabilistic approaches for the same model class can fail to be consistent. Experimental results are provided to corroborate the theoretical results.
Paper Structure (48 sections, 36 theorems, 205 equations, 16 figures, 8 tables, 1 algorithm)

This paper contains 48 sections, 36 theorems, 205 equations, 16 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

The minimax game in Equation eqn:bf_minmax_game can be equivalently written as follows. The first expression defines a learner prediction $g^{bf}$, the second defines an adversarial labeling $z^{*}$. Then, $g^{bf}=z^{*}$ and they are the maximum entropy distribution in $P$, the optimal solution to the right-most expression.

Figures (16)

  • Figure 1: Dawid-Skene Graphical Model
  • Figure 2: BF/OCDS loss breakdowns on the AwA dataset where $\mathcal{E}^{appr}_{ds,1} = d(\eta, g^{ds*}) - d(\eta, g^{*})$ is large. The green section (below solid line) is loss incurred by any prediction in $\mathcal{G}$.
  • Figure 3: BF/OCDS loss breakdowns on the SMS dataset where $\mathcal{E}^{appr}_{ds,1}$ is small.
  • Figure 4: Convergence of $g^{*}$ to $\eta$ on a synthetic dataset as $n$ increases.
  • Figure 5: Dawid-Skene Graphical Model
  • ...and 11 more figures

Theorems & Definitions (59)

  • Theorem 1
  • Theorem 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Theorem 6
  • Theorem 7
  • Lemma 8: OCDS Weights
  • Lemma 9: Informal
  • Lemma 10
  • ...and 49 more