Glivenko-Cantelli for $f$-divergence
Haoming Wang, Lek-Heng Lim
TL;DR
This work extends the Glivenko--Cantelli framework to general $f$-divergences by introducing an $f$-divergence over the ray class $\mathcal{R}$, addressing the challenge that $\mathcal{R}$ is not a σ-algebra. The core idea is to leverage a Radon--Nikodym-type property and projected densities $\mathrm{proj}_{G(\mathcal{R})}(d\mu/d\nu)$ to define $\mathop{\mathrm{D}}_f^{\mathcal{R}}(\mu \Vert \nu)$, which recovers the Kolmogorov--Smirnov distance for $f(t)=\frac{|t-1|}{2}$ and the standard $f$-divergence when extended to $\mathcal{B}$. The paper proves linearity, nonnegativity, affine invariance, and a basic identity for these divergences, and establishes Glivenko--Cantelli theorems for $\mathcal{R}$-divergences (convergence of $\mathop{\mathrm{D}}_f^{\mathcal{R}}(\nu_n \Vert \nu)$ and $\mathop{\mathrm{D}}_f^{\mathcal{R}}(\nu \Vert \nu_n)$ to zero a.s.). It also outlines a preliminary Vapnik--Chervonenkis theory for $f$-divergence via pre-Glivenko--Cantelli classes and discusses the limitations of Choquet-integral approaches in this setting. Overall, the work opens a path to robust statistical guarantees for a wide class of divergences beyond total variation and KS, with potential impact on learning theory and empirical process methods.
Abstract
We extend the celebrated Glivenko-Cantelli theorem, sometimes called the fundamental theorem of statistics, from its standard setting of total variation distance to all $f$-divergences. A key obstacle in this endeavor is to define $f$-divergence on a subcollection of a $σ$-algebra that forms a $π$-system but not a $σ$-subalgebra. This is a side contribution of our work. We will show that this notion of $f$-divergence on the $π$-system of rays preserves nearly all known properties of standard $f$-divergence, yields a novel integral representation of the Kolmogorov-Smirnov distance, and has a Glivenko-Cantelli theorem. We will also discuss the prospects of a Vapnik-Chervonenkis theory for $f$-divergence.
