A short note on learning discrete distributions

Clément L. Canonne

A short note on learning discrete distributions

Clément L. Canonne

TL;DR

The note analyzes the sample complexity of learning a discrete distribution on a known domain of size $k$ under several distance measures. It provides concise empirical-distribution-based proofs and concentration-based arguments, showing that learning under total variation and Hellinger distances requires $n=\Theta\big(\frac{k+\log(1/\delta)}{\varepsilon^2}\big)$ samples, while KL divergence admits the optimal $n=\Theta\big(\frac{k+\log(1/\delta)}{\varepsilon}\big)$ with the empirical estimator, and Kolmogorov, $\ell_{\infty}$, and $\ell_2$ distances admit $n=\Theta\big(\frac{\log(1/\delta)}{\varepsilon^2}\big)$ independent of $k$. The results leverage standard concentration inequalities (McDiarmid, Chernoff, DKW) and recent KL-concentration bounds to connect empirical performance across distance measures. Overall, the note clarifies folklore sample-complexity bounds with simple, self-contained proofs and highlights where optimal rates depend on the chosen distance metric.

Abstract

The goal of this short note is to provide simple proofs for the "folklore facts" on the sample complexity of learning a discrete probability distribution over a known domain of size $k$ to various distances $\varepsilon$, with error probability $δ$.

A short note on learning discrete distributions

TL;DR

The note analyzes the sample complexity of learning a discrete distribution on a known domain of size

under several distance measures. It provides concise empirical-distribution-based proofs and concentration-based arguments, showing that learning under total variation and Hellinger distances requires

samples, while KL divergence admits the optimal

with the empirical estimator, and Kolmogorov,

, and

distances admit

independent of

. The results leverage standard concentration inequalities (McDiarmid, Chernoff, DKW) and recent KL-concentration bounds to connect empirical performance across distance measures. Overall, the note clarifies folklore sample-complexity bounds with simple, self-contained proofs and highlights where optimal rates depend on the chosen distance metric.

Abstract

The goal of this short note is to provide simple proofs for the "folklore facts" on the sample complexity of learning a discrete probability distribution over a known domain of size

to various distances

, with error probability

A short note on learning discrete distributions

TL;DR

Abstract

A short note on learning discrete distributions

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (15)