The sample complexity of multi-distribution learning
Binghui Peng
TL;DR
The paper tackles multi-distribution learning, where the goal is to minimize the worst-case population loss across $k$ distributions within $\epsilon$ of the optimal loss over a VC class with dimension $d$. It introduces a boosting framework based on multiplicative weight updates and a novel recursive width reduction to reduce the number of MWU rounds, achieving a near-optimal sample complexity of $\widetilde{O}((d+k)\epsilon^{-2}) \cdot (k/\epsilon)^{o(1)}$ (up to polylog factors). Central to the approach are the concepts of width reduction, the construction of an $\epsilon$-cover, and the soundness/completeness properties that preserve the optimal classifier while enabling aggressive truncation of losses. The method also removes the need for exact knowledge of OPT by running across an OPT grid and refining, ultimately culminating in a final algorithm with the stated near-optimal sample complexity. These results resolve the COLT 2023 open problem and demonstrate that multi-distribution learning need not be harder than single-distribution PAC learning in terms of sample complexity, with potential broader impact for boosting methods in agnostic, multi-distribution settings.
Abstract
Multi-distribution learning generalizes the classic PAC learning to handle data coming from multiple distributions. Given a set of $k$ data distributions and a hypothesis class of VC dimension $d$, the goal is to learn a hypothesis that minimizes the maximum population loss over $k$ distributions, up to $ε$ additive error. In this paper, we settle the sample complexity of multi-distribution learning by giving an algorithm of sample complexity $\widetilde{O}((d+k)ε^{-2}) \cdot (k/ε)^{o(1)}$. This matches the lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem of Awasthi, Haghtalab and Zhao [AHZ23].
