Table of Contents
Fetching ...

Optimal Learning from Label Proportions with General Loss Functions

Lorne Applebaum, Travis Dick, Claudio Gentile, Haim Kaplan, Tomer Koren

TL;DR

The paper tackles Learning from Label Proportions where training labels are available only as bag-level proportions. It introduces a low-variance debiasing method that builds unbiased bag-level estimators for general loss functions across binary and multiclass tasks, with a variance bound independent of bag size $k$. It then develops a Median-of-Means tournament to select hypotheses using pairwise loss differences, achieving regret bounds with sample complexity that scales with $k$, the class count $c$, and the number of bags $m$. Empirical results on MNIST, CIFAR-10, Higgs, Adult, and Criteo demonstrate strong performance, especially for large bag sizes, and establish competitive baselines in both batch and online settings. This framework broadens LLP applicability to practical losses and real-world large-scale datasets such as online advertising conversion prediction.

Abstract

Motivated by problems in online advertising, we address the task of Learning from Label Proportions (LLP). We introduce a novel and versatile low-variance debiasing methodology to learn from aggregate label information, significantly advancing the state of the art in LLP. Our debiasing approach exhibits remarkable flexibility, seamlessly accommodating a broad spectrum of practically relevant loss functions across both binary and multi-class classification settings. By carefully combining our estimators with standard techniques, we improve sample complexity guarantees for a large class of losses of practical relevance. We also empirically validate the efficacy of our proposed approach across a diverse array of benchmark datasets, demonstrating compelling empirical advantages over standard baselines.

Optimal Learning from Label Proportions with General Loss Functions

TL;DR

The paper tackles Learning from Label Proportions where training labels are available only as bag-level proportions. It introduces a low-variance debiasing method that builds unbiased bag-level estimators for general loss functions across binary and multiclass tasks, with a variance bound independent of bag size . It then develops a Median-of-Means tournament to select hypotheses using pairwise loss differences, achieving regret bounds with sample complexity that scales with , the class count , and the number of bags . Empirical results on MNIST, CIFAR-10, Higgs, Adult, and Criteo demonstrate strong performance, especially for large bag sizes, and establish competitive baselines in both batch and online settings. This framework broadens LLP applicability to practical losses and real-world large-scale datasets such as online advertising conversion prediction.

Abstract

Motivated by problems in online advertising, we address the task of Learning from Label Proportions (LLP). We introduce a novel and versatile low-variance debiasing methodology to learn from aggregate label information, significantly advancing the state of the art in LLP. Our debiasing approach exhibits remarkable flexibility, seamlessly accommodating a broad spectrum of practically relevant loss functions across both binary and multi-class classification settings. By carefully combining our estimators with standard techniques, we improve sample complexity guarantees for a large class of losses of practical relevance. We also empirically validate the efficacy of our proposed approach across a diverse array of benchmark datasets, demonstrating compelling empirical advantages over standard baselines.

Paper Structure

This paper contains 37 sections, 9 theorems, 243 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Lemma 3.1

Let $(x_1,y_1),\ldots, (x_k,y_k) \in \mathcal{X}\xspace \times \{0,1\}$ be drawn i.i.d. from $\mathcal{D}$. Let $z = ((x_1,\ldots,x_k),\alpha)$ be the corresponding bag. For any function $h \in \mathcal{H}$, and any binary loss $\ell(h(x),y) = f_1(h(x)) + y f_2(h(x))$, we have and $\operatorname{Var}_z(\ell_b(h,z)) \leq \frac{5}{2}\,{\mathop{\mathrm{\mathbb{E}}}\limits}_x\bigl[\bigl(f_2(h(x)) \b

Figures (6)

  • Figure 1: Minimum average test log loss achieved for each $k$, optimized across various learning rates and stopping epochs. Results are averaged over 10 repetitions, with error bars representing one standard error.
  • Figure 2: Left: Average chunk log loss for each LLP loss and bag size. Right: the per-chunk log losses during online training for bag size $2^{11}$. Error bars indicate one standard error in the mean across 5 repetitions.
  • Figure 3: Average test log loss when training using each aggregate loss in the batch setting for all bag sizes. For each bag size we report the lowest log loss achieved over all learning rate and stopping epoch combinations. Error bars indicate one standard error in the mean across repetitions.
  • Figure 4: Average test AUC when training using each aggregate loss in the batch setting. For each bag size we report the highest AUC achieved over all learning rate and stopping epoch combinations. Error bars indicate one standard error in the mean across repetitions.
  • Figure 5: Average test AUC when training using each aggregate loss in the batch setting for all bag sizes. For each bag size we report the highest AUC achieved over all learning rate and stopping epoch combinations. Error bars indicate one standard error in the mean across repetitions.
  • ...and 1 more figures

Theorems & Definitions (18)

  • Lemma 3.1
  • Theorem 3.2
  • Example 3.3
  • Example 3.4
  • Theorem 3.5
  • proof
  • Proposition 2.1
  • proof
  • proof
  • Theorem 2.2
  • ...and 8 more