Table of Contents
Fetching ...

The sample complexity of multi-distribution learning

Binghui Peng

TL;DR

The paper tackles multi-distribution learning, where the goal is to minimize the worst-case population loss across $k$ distributions within $\epsilon$ of the optimal loss over a VC class with dimension $d$. It introduces a boosting framework based on multiplicative weight updates and a novel recursive width reduction to reduce the number of MWU rounds, achieving a near-optimal sample complexity of $\widetilde{O}((d+k)\epsilon^{-2}) \cdot (k/\epsilon)^{o(1)}$ (up to polylog factors). Central to the approach are the concepts of width reduction, the construction of an $\epsilon$-cover, and the soundness/completeness properties that preserve the optimal classifier while enabling aggressive truncation of losses. The method also removes the need for exact knowledge of OPT by running across an OPT grid and refining, ultimately culminating in a final algorithm with the stated near-optimal sample complexity. These results resolve the COLT 2023 open problem and demonstrate that multi-distribution learning need not be harder than single-distribution PAC learning in terms of sample complexity, with potential broader impact for boosting methods in agnostic, multi-distribution settings.

Abstract

Multi-distribution learning generalizes the classic PAC learning to handle data coming from multiple distributions. Given a set of $k$ data distributions and a hypothesis class of VC dimension $d$, the goal is to learn a hypothesis that minimizes the maximum population loss over $k$ distributions, up to $ε$ additive error. In this paper, we settle the sample complexity of multi-distribution learning by giving an algorithm of sample complexity $\widetilde{O}((d+k)ε^{-2}) \cdot (k/ε)^{o(1)}$. This matches the lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem of Awasthi, Haghtalab and Zhao [AHZ23].

The sample complexity of multi-distribution learning

TL;DR

The paper tackles multi-distribution learning, where the goal is to minimize the worst-case population loss across distributions within of the optimal loss over a VC class with dimension . It introduces a boosting framework based on multiplicative weight updates and a novel recursive width reduction to reduce the number of MWU rounds, achieving a near-optimal sample complexity of (up to polylog factors). Central to the approach are the concepts of width reduction, the construction of an -cover, and the soundness/completeness properties that preserve the optimal classifier while enabling aggressive truncation of losses. The method also removes the need for exact knowledge of OPT by running across an OPT grid and refining, ultimately culminating in a final algorithm with the stated near-optimal sample complexity. These results resolve the COLT 2023 open problem and demonstrate that multi-distribution learning need not be harder than single-distribution PAC learning in terms of sample complexity, with potential broader impact for boosting methods in agnostic, multi-distribution settings.

Abstract

Multi-distribution learning generalizes the classic PAC learning to handle data coming from multiple distributions. Given a set of data distributions and a hypothesis class of VC dimension , the goal is to learn a hypothesis that minimizes the maximum population loss over distributions, up to additive error. In this paper, we settle the sample complexity of multi-distribution learning by giving an algorithm of sample complexity . This matches the lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem of Awasthi, Haghtalab and Zhao [AHZ23].
Paper Structure (12 sections, 11 theorems, 43 equations, 5 algorithms)

This paper contains 12 sections, 11 theorems, 43 equations, 5 algorithms.

Key Result

Theorem 1.1

Let $k$ be the number of distributions, $d$ be the VC dimension of the hypothesis class. For any $\epsilon > 0$, there is an algorithm that outputs an $\epsilon$-optimal classifier with probability $1-\delta$, and has sample complexity

Theorems & Definitions (19)

  • Theorem 1.1: Multi-distribution learning
  • Definition 2.1: Multi-distribution learning
  • Lemma 2.2: Sauer–Shelah Lemma sauer1972densityshelah1972combinatorial
  • Lemma 2.3: Regret guarantee of MWU arora2012multiplicative
  • Lemma 3.1: Boosting framework
  • Lemma 3.2: Guarantee of $\textsc{ConstructCover}$, adapted from Lemma 3.3 of alon2019limits
  • Lemma 3.3: Guarantee of $\textsc{Filter}$, Part 1
  • proof
  • Lemma 3.4: Guarantee of $\textsc{Filter}$, Part 2
  • proof
  • ...and 9 more