A General Framework for Learning from Weak Supervision

Hao Chen; Jindong Wang; Lei Feng; Xiang Li; Yidong Wang; Xing Xie; Masashi Sugiyama; Rita Singh; Bhiksha Raj

A General Framework for Learning from Weak Supervision

Hao Chen, Jindong Wang, Lei Feng, Xiang Li, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj

TL;DR

This work tackles the practical challenge of learning from arbitrary weak supervision by proposing GLWS, a unified EM framework that treats precise labels as latent and models diverse weak sources as a Non-deterministic Finite Automaton (NFA). By forming the product of the prediction sequence and the NFA, GLWS employs a forward-backward algorithm to solve the complete EM with linear time complexity in the sequence length, enabling scalable learning across many weak supervision forms. The main contributions are: (1) a general framework that accommodates instance partial labels, aggregate statistics, pairwise observations, and unlabeled data; (2) an NFA-based representation of weak supervision; (3) a linear-time forward-backward EM algorithm; and (4) extensive experiments showing state-of-the-art performance across 11 weak supervision settings and multiple datasets, with practical comments on convergence and runtime. This approach promises broad impact for the deployment of weakly supervised learning in real-world, data-limited, or privacy-constrained settings, aided by open-source code.

Abstract

Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment. This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm. Central to GLWS is an Expectation-Maximization (EM) formulation, adeptly accommodating various weak supervision sources, including instance partial labels, aggregate statistics, pairwise observations, and unlabeled data. We further present an advanced algorithm that significantly simplifies the EM computational demands using a Non-deterministic Finite Automaton (NFA) along with a forward-backward algorithm, which effectively reduces time complexity from quadratic or factorial often required in existing solutions to linear scale. The problem of learning from arbitrary weak supervision is therefore converted to the NFA modeling of them. GLWS not only enhances the scalability of machine learning models but also demonstrates superior performance and versatility across 11 weak supervision scenarios. We hope our work paves the way for further advancements and practical deployment in this field.

A General Framework for Learning from Weak Supervision

TL;DR

Abstract

Paper Structure (39 sections, 2 theorems, 13 equations, 9 figures, 20 tables, 3 algorithms)

This paper contains 39 sections, 2 theorems, 13 equations, 9 figures, 20 tables, 3 algorithms.

Introduction
Related Work
Learning from Weak Supervision
Towards the Unification of Weak Supervision
Method
Preliminaries
General Framework for Weak Supervision
Weak Supervision as NFA
The Forward-Backward Algorithm
Extension to Multi-Class or Multi-Label Scenarios
Experiments
Partial Labels
Aggregate Observations
Pairwise Observations
Unlabeled Data
...and 24 more sections

Key Result

Proposition 3.2

For weakly supervised learning problems, the training objectives can be derived from eq:em as:

Figures (9)

Figure 1: Average performance overview of the proposed method on 11 common weak supervision settings, compared to previous best methods (margins shown on the top of bars). GLWS is capable of learning from any weak supervision universally and effectively.
Figure 2: Overview of GLWS for learning from arbitrary weak supervision. We model weak supervision as a Non-deterministic Finite Automaton (NFA). By taking the product of the prediction sequence and NFA, we can utilize the forward-backward algorithm to solve the proposed complete EM formulation in linear time.
Figure 3: NFA for common weak supervision types for a sequence input of size $L$. (a) Partial labels, where the NFA has $L$ transitions for each input with partial labels as symbols; (b) Multiple instances, whose NFA has 2 states, and can only transit to the accepting state via $1$ to ensure at least one positive instance in the sequence; (c) Label proportion, whose NFA has $m+1$ states for $m$ positive samples in the sequence; (d) Pairwise comparison, whose NFA has 3 states and covers $\{(1, 1), (1, 0), (0, 0)\}$; (e) Pairwise similarity with confidence score $c$. The NFA also has 3 states and covers $\{(1, 1), (0, 0)\}$. If $c$ is given as in similarity confidence and confidence difference, each edge is weighted by $c$; (f) Pairwise dissimilarity with confidence $c$ for $\{(1, 0), (0, 1)\}$; (g) Positive confidence, whose NFA also has $L$ transitions weighted by confidence $c$; (h) Unlabeled data with class prior $p$. The NFA is equivalent to expectation of label count as $pn$.
Figure 4: Illustration of the forward pass and backward pass in forward-backward algorithm to compute $p(y^j, w | \mathbf{x}^{1:L};\theta^t)$.
Figure 5: Convergence of accuracy with error bar on multiple instance learning with long input sequence. (a) CIFAR-10 with bag length distribution of $\mathcal{N}(20, 5)$; (b) CIFAR-100 with $\mathcal{N}(10, 2)$. Our method shows superior convergence with more stable training.
...and 4 more figures

Theorems & Definitions (4)

Proposition 3.2
Definition 3.3
Proposition 3.4
proof

A General Framework for Learning from Weak Supervision

TL;DR

Abstract

A General Framework for Learning from Weak Supervision

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (4)