Table of Contents
Fetching ...

Glivenko-Cantelli for $f$-divergence

Haoming Wang, Lek-Heng Lim

TL;DR

This work extends the Glivenko--Cantelli framework to general $f$-divergences by introducing an $f$-divergence over the ray class $\mathcal{R}$, addressing the challenge that $\mathcal{R}$ is not a σ-algebra. The core idea is to leverage a Radon--Nikodym-type property and projected densities $\mathrm{proj}_{G(\mathcal{R})}(d\mu/d\nu)$ to define $\mathop{\mathrm{D}}_f^{\mathcal{R}}(\mu \Vert \nu)$, which recovers the Kolmogorov--Smirnov distance for $f(t)=\frac{|t-1|}{2}$ and the standard $f$-divergence when extended to $\mathcal{B}$. The paper proves linearity, nonnegativity, affine invariance, and a basic identity for these divergences, and establishes Glivenko--Cantelli theorems for $\mathcal{R}$-divergences (convergence of $\mathop{\mathrm{D}}_f^{\mathcal{R}}(\nu_n \Vert \nu)$ and $\mathop{\mathrm{D}}_f^{\mathcal{R}}(\nu \Vert \nu_n)$ to zero a.s.). It also outlines a preliminary Vapnik--Chervonenkis theory for $f$-divergence via pre-Glivenko--Cantelli classes and discusses the limitations of Choquet-integral approaches in this setting. Overall, the work opens a path to robust statistical guarantees for a wide class of divergences beyond total variation and KS, with potential impact on learning theory and empirical process methods.

Abstract

We extend the celebrated Glivenko-Cantelli theorem, sometimes called the fundamental theorem of statistics, from its standard setting of total variation distance to all $f$-divergences. A key obstacle in this endeavor is to define $f$-divergence on a subcollection of a $σ$-algebra that forms a $π$-system but not a $σ$-subalgebra. This is a side contribution of our work. We will show that this notion of $f$-divergence on the $π$-system of rays preserves nearly all known properties of standard $f$-divergence, yields a novel integral representation of the Kolmogorov-Smirnov distance, and has a Glivenko-Cantelli theorem. We will also discuss the prospects of a Vapnik-Chervonenkis theory for $f$-divergence.

Glivenko-Cantelli for $f$-divergence

TL;DR

This work extends the Glivenko--Cantelli framework to general -divergences by introducing an -divergence over the ray class , addressing the challenge that is not a σ-algebra. The core idea is to leverage a Radon--Nikodym-type property and projected densities to define , which recovers the Kolmogorov--Smirnov distance for and the standard -divergence when extended to . The paper proves linearity, nonnegativity, affine invariance, and a basic identity for these divergences, and establishes Glivenko--Cantelli theorems for -divergences (convergence of and to zero a.s.). It also outlines a preliminary Vapnik--Chervonenkis theory for -divergence via pre-Glivenko--Cantelli classes and discusses the limitations of Choquet-integral approaches in this setting. Overall, the work opens a path to robust statistical guarantees for a wide class of divergences beyond total variation and KS, with potential impact on learning theory and empirical process methods.

Abstract

We extend the celebrated Glivenko-Cantelli theorem, sometimes called the fundamental theorem of statistics, from its standard setting of total variation distance to all -divergences. A key obstacle in this endeavor is to define -divergence on a subcollection of a -algebra that forms a -system but not a -subalgebra. This is a side contribution of our work. We will show that this notion of -divergence on the -system of rays preserves nearly all known properties of standard -divergence, yields a novel integral representation of the Kolmogorov-Smirnov distance, and has a Glivenko-Cantelli theorem. We will also discuss the prospects of a Vapnik-Chervonenkis theory for -divergence.

Paper Structure

This paper contains 11 sections, 34 theorems, 142 equations, 1 figure.

Key Result

Theorem 1.4

Let $\nu$ be a Borel probability measure and $\nu_n$ be the corresponding empirical measure, $n \in \mathbb{N}$. Then, almost surely,

Figures (1)

  • Figure 1: 40 Level curves of total variation $\mathop{\mathrm{D}}\nolimits_{\mathop{\mathrm{ TV}}\nolimits}$, total variation over rays $\mathop{\mathrm{D}}\nolimits_{\mathop{\mathrm{ TV}}\nolimits}^\mathcal{R}$, Hellinger distance $\mathop{\mathrm{D}}\nolimits_{\mathop{\mathrm{ H}}\nolimits}$ and Hellinger distance over rays $\mathop{\mathrm{D}}\nolimits_{\mathop{\mathrm{ H}}\nolimits}^\mathcal{R}$ for fixed $\nu=[0.2,0.5,0.3]$ as $\mu$ ranges over the simplex of distributions on a three-element set.

Theorems & Definitions (70)

  • Example 1.1
  • Example 1.2: Shortt
  • Definition 1.3: Rays
  • Theorem 1.4: Glivenko--Cantelli
  • Definition 1.5: $f$-divergence
  • Lemma 2.1
  • Lemma 2.2
  • Lemma 2.3
  • Lemma 2.4
  • Proposition 3.1
  • ...and 60 more