Table of Contents
Fetching ...

The Empirical Mean is Minimax Optimal for Local Glivenko-Cantelli

Doron Cohen, Aryeh Kontorovich, Roi Weiss

TL;DR

The paper analyzes Local Glivenko-Cantelli (LGC) for product Bernoulli measures and examines learning with estimators beyond the Empirical Mean Estimator (EME). It proves that, under non-pathological, decaying, and symmetry conditions, the LGC class is the largest learnable family for any estimator, and the EME achieves the minimax rate over such families. It further shows that allowing certain pathologies enables learning larger classes (e.g., union with constant sequences) via a relaxation construction, and provides a conjecture and open problems toward even richer extensions. The approach combines information-theoretic lower bounds (Fano) with constructive estimators and testing-based schemes, yielding both sharp lower bounds and practical estimators with provable consistency. The results clarify fundamental limits of distribution-dependent uniform convergence in high dimensions and illuminate when structure-aware estimators can outperform the standard EME.

Abstract

We revisit the recently introduced Local Glivenko-Cantelli setting, which studies distribution-dependent uniform convergence rates of the Empirical Mean Estimator (EME). In this work, we investigate generalizations of this setting where arbitrary estimators are allowed rather than just the EME. Can a strictly larger class of measures be learned? Can better risk decay rates be obtained? We provide exhaustive answers to these questions, which are both negative, provided the learner is barred from exploiting some infinite-dimensional pathologies. On the other hand, allowing such exploits does lead to a strictly larger class of learnable measures.

The Empirical Mean is Minimax Optimal for Local Glivenko-Cantelli

TL;DR

The paper analyzes Local Glivenko-Cantelli (LGC) for product Bernoulli measures and examines learning with estimators beyond the Empirical Mean Estimator (EME). It proves that, under non-pathological, decaying, and symmetry conditions, the LGC class is the largest learnable family for any estimator, and the EME achieves the minimax rate over such families. It further shows that allowing certain pathologies enables learning larger classes (e.g., union with constant sequences) via a relaxation construction, and provides a conjecture and open problems toward even richer extensions. The approach combines information-theoretic lower bounds (Fano) with constructive estimators and testing-based schemes, yielding both sharp lower bounds and practical estimators with provable consistency. The results clarify fundamental limits of distribution-dependent uniform convergence in high dimensions and illuminate when structure-aware estimators can outperform the standard EME.

Abstract

We revisit the recently introduced Local Glivenko-Cantelli setting, which studies distribution-dependent uniform convergence rates of the Empirical Mean Estimator (EME). In this work, we investigate generalizations of this setting where arbitrary estimators are allowed rather than just the EME. Can a strictly larger class of measures be learned? Can better risk decay rates be obtained? We provide exhaustive answers to these questions, which are both negative, provided the learner is barred from exploiting some infinite-dimensional pathologies. On the other hand, allowing such exploits does lead to a strictly larger class of learnable measures.
Paper Structure (25 sections, 6 theorems, 68 equations, 3 figures)

This paper contains 25 sections, 6 theorems, 68 equations, 3 figures.

Key Result

Theorem 1

Suppose that $\mathcal{P}\subset[0,1]^\mathbb{N}$ defines a family of product distributions as in (eq:mup) and furthermore Then $\mathcal{P}\subseteq\dot{\mathsf{LGC}}$.

Figures (3)

  • Figure 1: Illustration of the step profile construction for $p^{(k)}$ (top) and the special case for $p^{(J+1)}$ (bottom). Each bar represents the value of $p_{j}^{(k)}$ at position $j$. Values are shown above the bars.
  • Figure 2: Average supremum deviation $\Delta_n$ as a function of sample size $n$ on a log-log scale for varying $q$ values ($q = 0.1, 0.2, 0.05, 0.01, 0.005, 0.002$). Empirical results (dashed lines) are averaged over $J = 100$, $1000$, and $10000$ repetitions and are compared to theoretical predictions (solid lines).
  • Figure 3: Error comparison between the EME and the simple average estimator for varying sample sizes $n$ under different distributions: uniform, triangular, Beta(2,2), exponential, $1/n$, and Gaussian. Results are plotted for $k \in \{10, 50, 100, 500\}$ to illustrate the effect of averaging.

Theorems & Definitions (6)

  • Theorem 1: expanding $\mathsf{LGC}$
  • Theorem 2: Minimax bound
  • Theorem 3: Relaxing decay and symmetry
  • Proposition 1: Relaxing decay
  • Lemma 1: yu1997assouad
  • Lemma 2: van2014probability Problem 5.1a