ShakyPrepend: A Multi-Group Learner with Improved Sample Complexity

Lujing Zhang; Daniel Hsu; Sivaraman Balakrishnan

ShakyPrepend: A Multi-Group Learner with Improved Sample Complexity

Lujing Zhang, Daniel Hsu, Sivaraman Balakrishnan

TL;DR

This work proposes ShakyPrepend, a method that leverages tools inspired by differential privacy to obtain improved theoretical guarantees over existing approaches and provides practical guidance for deploying multi-group learning algorithms in real-world settings.

Abstract

Multi-group learning is a learning task that focuses on controlling predictors' conditional losses over specified subgroups. We propose ShakyPrepend, a method that leverages tools inspired by differential privacy to obtain improved theoretical guarantees over existing approaches. Through numerical experiments, we demonstrate that ShakyPrepend adapts to both group structure and spatial heterogeneity. We provide practical guidance for deploying multi-group learning algorithms in real-world settings.

ShakyPrepend: A Multi-Group Learner with Improved Sample Complexity

TL;DR

Abstract

Paper Structure (26 sections, 15 theorems, 76 equations, 10 figures, 7 algorithms)

This paper contains 26 sections, 15 theorems, 76 equations, 10 figures, 7 algorithms.

Introduction
Related Work
Paper Outline
Background
Setting and Notation
Empirical Conditional Risk Convergence Result
Differential Privacy
Shaky Prepend: Algorithm and Guarantees
Fractional Variants of Prepend-Like Algorithms
Relation with Gradient Boosting
Experiments
Criterion Selection: Guidance for Hyperparameter Tuning
Unbalanced-Group Setting: Group-Size Adaptivity of Group Prepend and Shaky Prepend
Spatial Adaptivity
Fractional Variants of Prepend-Like Algorithms
...and 11 more sections

Key Result

Theorem 1

Let $\mathcal{H}$ be a hypothesis class, let $\mathcal{G}$ be a set of groups, and let $\ell: \mathcal{Z} \times \mathcal{Y} \to [0,1]$ be a loss function. With probability at least $1 - \delta$, where $D = 2 \ln\left( \Pi_{2n}( (\ell \circ \mathcal{H})_{\mathrm{thresh}} ) \Pi_{2n}(\mathcal{G}) \right) + \ln\left( \frac{8}{\delta} \right).$

Figures (10)

Figure 1: Criterion-selection setup.Top: Ground truth (orange) and noisy training samples (blue). Bottom: Groups are constructed so that the worst-group loss is not aligned with the total loss.
Figure 2: Criterion selection (large sample). Total loss (left) and worst-group loss (right) when tuning hyperparameters by total loss vs. worst-group loss (20 runs; 26,000 training points per run).
Figure 3: Criterion selection (small sample). Total loss (left) and worst-group loss (right) when tuning hyperparameters by total loss vs. worst-group loss (20 runs; 260 training points per run).
Figure 4: Unbalanced-group setup.Top: Ground truth (orange) and noisy training samples (blue). Bottom: Layered group intervals, where only part of the domain is further refined into smaller subgroups, yielding unbalanced granularity.
Figure 5: Unbalanced-group simulation. Total loss (left) and worst-group loss (right) for Prepend, Group Prepend, and Shaky Prepend (20 runs; 120 training points per run).
...and 5 more figures

Theorems & Definitions (32)

Theorem 1: *tosh2024simplenearoptimalalgorithmshidden
Definition 1: Differential Privacy
Definition 2: $\ell_1$-sensitivity
Theorem 2
Remark 1
Remark 2
Theorem 3
proof : Proof sketch
Lemma 1: Upper Bound for Update Times
Lemma 2: Differential Privacy
...and 22 more

ShakyPrepend: A Multi-Group Learner with Improved Sample Complexity

TL;DR

Abstract

ShakyPrepend: A Multi-Group Learner with Improved Sample Complexity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (32)