Learning from Label Proportions: Bootstrapping Supervised Learners via Belief Propagation

Shreyas Havaldar; Navodita Sharma; Shubhi Sareen; Karthikeyan Shanmugam; Aravindan Raghuveer

Learning from Label Proportions: Bootstrapping Supervised Learners via Belief Propagation

Shreyas Havaldar, Navodita Sharma, Shubhi Sareen, Karthikeyan Shanmugam, Aravindan Raghuveer

TL;DR

This work proposes a novel algorithmic framework for LLP Binary Classification that iteratively performs pseudo labels andEmbedding Refinement to provide supervision for a learner that yields a better embedding, with minimal computational overhead above standard supervised learning.

Abstract

Learning from Label Proportions (LLP) is a learning problem where only aggregate level labels are available for groups of instances, called bags, during training, and the aim is to get the best performance at the instance-level on the test data. This setting arises in domains like advertising and medicine due to privacy considerations. We propose a novel algorithmic framework for this problem that iteratively performs two main steps. For the first step (Pseudo Labeling) in every iteration, we define a Gibbs distribution over binary instance labels that incorporates a) covariate information through the constraint that instances with similar covariates should have similar labels and b) the bag level aggregated label. We then use Belief Propagation (BP) to marginalize the Gibbs distribution to obtain pseudo labels. In the second step (Embedding Refinement), we use the pseudo labels to provide supervision for a learner that yields a better embedding. Further, we iterate on the two steps again by using the second step's embeddings as new covariates for the next iteration. In the final iteration, a classifier is trained using the pseudo labels. Our algorithm displays strong gains against several SOTA baselines (up to 15%) for the LLP Binary Classification problem on various dataset types - tabular and Image. We achieve these improvements with minimal computational overhead above standard supervised learning due to Belief Propagation, for large bag sizes, even for a million samples.

Learning from Label Proportions: Bootstrapping Supervised Learners via Belief Propagation

TL;DR

Abstract

Paper Structure (37 sections, 2 theorems, 10 equations, 12 figures, 16 tables, 1 algorithm)

This paper contains 37 sections, 2 theorems, 10 equations, 12 figures, 16 tables, 1 algorithm.

Introduction
Related Work
Problem Setting and Overview of Our Solution
Details of our Algorithm
Step 1: Obtaining Pseudo-Labels through Belief Propagation (BP)
Step 2: Embedding Refinement Leveraging Pseudo Labels
Iterative Refinement
Experiments
Experimental Setup
Performance Analysis
Ablations
Time Complexity
Importance of Nearest Neighbor constraints for BP
Learning Aggregate Embeddings Helps
Conclusion
...and 22 more sections

Key Result

Theorem 1

Let $f: \mathcal{A} \rightarrow \mathbb{R}$ be a real-valued function. Let $\tau = \Delta f \sqrt{2 \ln (1.25/\delta)}/\epsilon$. The Gaussian Mechanism, which adds independently drawn random noise distributed as $\mathcal{N}(0, \tau^2)$ to output of $f(A)$, ensures $(\epsilon, \delta)$-differential

Figures (12)

Figure 1: 12 instances placed equally into 3 bags. Step-1: On the k-nn graph induced by the covariate embeddings we perform belief propagation to obtain pseudo-labels that respect edge constraints and bag constraints. Then in Step-2 we fit a MLP to the instance pseudo-labels and bag aggregate label. Embedding learned in an intermediate layer is used to further refine the k-nn graph in Step-1. (Figures best viewed in colour)
Figure 2: Comparison of the performance on adding different percentage of neighbours on Marketing on bag-size 8 and 512
Figure 3: Comparison of the performance on different types of the pooling for the aggregate-embedding loss across different datasets for bag-size 512 for Adult and Marketing and bag-size 32 for Criteo, CIFAR-B and CIFAR-S at the end of iteration $1$.
Figure 4: Change in values of Test AUROC when using only 1 nearest neighbour for the covariate factor creation, v/s when using an optimal higher $k$ for the covariate factor creation.
Figure 5: Recovered performance on adding noise to the initial embeddings.
...and 7 more figures

Theorems & Definitions (5)

Theorem 1: Theorem 2 in dwork2014analyze
Lemma 1
proof
Claim 1
proof

Learning from Label Proportions: Bootstrapping Supervised Learners via Belief Propagation

TL;DR

Abstract

Learning from Label Proportions: Bootstrapping Supervised Learners via Belief Propagation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (5)