A short note on learning discrete distributions
Clément L. Canonne
TL;DR
The note analyzes the sample complexity of learning a discrete distribution on a known domain of size $k$ under several distance measures. It provides concise empirical-distribution-based proofs and concentration-based arguments, showing that learning under total variation and Hellinger distances requires $n=\Theta\big(\frac{k+\log(1/\delta)}{\varepsilon^2}\big)$ samples, while KL divergence admits the optimal $n=\Theta\big(\frac{k+\log(1/\delta)}{\varepsilon}\big)$ with the empirical estimator, and Kolmogorov, $\ell_{\infty}$, and $\ell_2$ distances admit $n=\Theta\big(\frac{\log(1/\delta)}{\varepsilon^2}\big)$ independent of $k$. The results leverage standard concentration inequalities (McDiarmid, Chernoff, DKW) and recent KL-concentration bounds to connect empirical performance across distance measures. Overall, the note clarifies folklore sample-complexity bounds with simple, self-contained proofs and highlights where optimal rates depend on the chosen distance metric.
Abstract
The goal of this short note is to provide simple proofs for the "folklore facts" on the sample complexity of learning a discrete probability distribution over a known domain of size $k$ to various distances $\varepsilon$, with error probability $δ$.
