Table of Contents
Fetching ...

Derandomizing Multi-Distribution Learning

Kasper Green Larsen, Omar Montasser, Nikita Zhivotovskiy

TL;DR

It is shown that derandomizing multi-distribution learning is computationally hard, even when ERM is computationally efficient, and a structural condition is identified enabling an efficient black-box reduction, converting existing randomized multi-distribution predictors into deterministic ones.

Abstract

Multi-distribution or collaborative learning involves learning a single predictor that works well across multiple data distributions, using samples from each during training. Recent research on multi-distribution learning, focusing on binary loss and finite VC dimension classes, has shown near-optimal sample complexity that is achieved with oracle efficient algorithms. That is, these algorithms are computationally efficient given an efficient ERM for the class. Unlike in classical PAC learning, where the optimal sample complexity is achieved with deterministic predictors, current multi-distribution learning algorithms output randomized predictors. This raises the question: can these algorithms be derandomized to produce a deterministic predictor for multiple distributions? Through a reduction to discrepancy minimization, we show that derandomizing multi-distribution learning is computationally hard, even when ERM is computationally efficient. On the positive side, we identify a structural condition enabling an efficient black-box reduction, converting existing randomized multi-distribution predictors into deterministic ones.

Derandomizing Multi-Distribution Learning

TL;DR

It is shown that derandomizing multi-distribution learning is computationally hard, even when ERM is computationally efficient, and a structural condition is identified enabling an efficient black-box reduction, converting existing randomized multi-distribution predictors into deterministic ones.

Abstract

Multi-distribution or collaborative learning involves learning a single predictor that works well across multiple data distributions, using samples from each during training. Recent research on multi-distribution learning, focusing on binary loss and finite VC dimension classes, has shown near-optimal sample complexity that is achieved with oracle efficient algorithms. That is, these algorithms are computationally efficient given an efficient ERM for the class. Unlike in classical PAC learning, where the optimal sample complexity is achieved with deterministic predictors, current multi-distribution learning algorithms output randomized predictors. This raises the question: can these algorithms be derandomized to produce a deterministic predictor for multiple distributions? Through a reduction to discrepancy minimization, we show that derandomizing multi-distribution learning is computationally hard, even when ERM is computationally efficient. On the positive side, we identify a structural condition enabling an efficient black-box reduction, converting existing randomized multi-distribution predictors into deterministic ones.
Paper Structure (8 sections, 6 theorems, 25 equations, 1 algorithm)

This paper contains 8 sections, 6 theorems, 25 equations, 1 algorithm.

Key Result

Theorem 1

If $\textnormal{BPP} \neq \textnormal{NP}$, then as $n = \min\{d,k,1/\varepsilon\}$ tends to infinity, for every hypothesis class $\mathcal{H}$ of VC-dimension $d$ for which one can find $n$ points shattered by $\mathcal{H}$ in polynomial time, any multi-distribution learning algorithm for $\mathcal

Theorems & Definitions (10)

  • Theorem 1
  • Theorem 2
  • Theorem 3: discrepancy
  • proof
  • Lemma 4
  • Lemma 5
  • proof : Proof of \ref{['thm:algo']}
  • proof : Proof of \ref{['lem:heavy']}
  • proof : Proof of \ref{['lem:light']}
  • Theorem 6: hoeffdingkwise