Table of Contents
Fetching ...

Is Multi-Distribution Learning as Easy as PAC Learning: Sharp Rates with Bounded Label Noise

Rafael Hanashiro, Abhishek Shetty, Patrick Jaillet

TL;DR

It is demonstrated that learning across $k$ distributions inherently incurs slow rates scaling with $k/\epsilon^2$, even under constant noise levels, unless each distribution is learned separately, unless each distribution is learned separately.

Abstract

Towards understanding the statistical complexity of learning from heterogeneous sources, we study the problem of multi-distribution learning. Given $k$ data sources, the goal is to output a classifier for each source by exploiting shared structure to reduce sample complexity. We focus on the bounded label noise setting to determine whether the fast $1/ε$ rates achievable in single-task learning extend to this regime with minimal dependence on $k$. Surprisingly, we show that this is not the case. We demonstrate that learning across $k$ distributions inherently incurs slow rates scaling with $k/ε^2$, even under constant noise levels, unless each distribution is learned separately. A key technical contribution is a structured hypothesis-testing framework that captures the statistical cost of certifying near-optimality under bounded noise-a cost we show is unavoidable in the multi-distribution setting. Finally, we prove that when competing with the stronger benchmark of each distribution's optimal Bayes error, the sample complexity incurs a \textit{multiplicative} penalty in $k$. This establishes a \textit{statistical} separation between random classification noise and Massart noise, highlighting a fundamental barrier unique to learning from multiple sources.

Is Multi-Distribution Learning as Easy as PAC Learning: Sharp Rates with Bounded Label Noise

TL;DR

It is demonstrated that learning across distributions inherently incurs slow rates scaling with , even under constant noise levels, unless each distribution is learned separately, unless each distribution is learned separately.

Abstract

Towards understanding the statistical complexity of learning from heterogeneous sources, we study the problem of multi-distribution learning. Given data sources, the goal is to output a classifier for each source by exploiting shared structure to reduce sample complexity. We focus on the bounded label noise setting to determine whether the fast rates achievable in single-task learning extend to this regime with minimal dependence on . Surprisingly, we show that this is not the case. We demonstrate that learning across distributions inherently incurs slow rates scaling with , even under constant noise levels, unless each distribution is learned separately. A key technical contribution is a structured hypothesis-testing framework that captures the statistical cost of certifying near-optimality under bounded noise-a cost we show is unavoidable in the multi-distribution setting. Finally, we prove that when competing with the stronger benchmark of each distribution's optimal Bayes error, the sample complexity incurs a \textit{multiplicative} penalty in . This establishes a \textit{statistical} separation between random classification noise and Massart noise, highlighting a fundamental barrier unique to learning from multiple sources.
Paper Structure (57 sections, 19 theorems, 142 equations, 6 algorithms)

This paper contains 57 sections, 19 theorems, 142 equations, 6 algorithms.

Key Result

Lemma 3.1

Let $\mathcal{U}\subset\cbr{P_1,\dots,P_k}$ and let $\bar{P}_\mathcal{U} = \frac{1}{\abs{\mathcal{U}}} \sum_{i\in\mathcal{U}} P_i$ be their uniform mixture. Then, ERM $\hat{f} = \mathop{\mathrm{ERM}}\nolimits_\mathcal{F}\del{S}$ on a sample $S\overset{iid}{\sim} \bar{P}_\mathcal{U}$ of size $\abs{S}

Theorems & Definitions (29)

  • Lemma 3.1
  • Lemma 3.2
  • Theorem 3.3: \ref{['eq:MDL-RCN']} upper bound
  • Theorem 3.4: \ref{['eq:MDL-MM']} upper bound
  • Remark : Condition for separate learning
  • Lemma 4.1: Testing via empirical errors
  • Lemma 4.2: From learning to testing
  • Theorem 4.3: \ref{['eq:SHT']} upper bound
  • Theorem 4.4: \ref{['eq:SHT']} lower bound
  • proof : Proof sketch of \ref{['thm:SHT-lb']}
  • ...and 19 more