Table of Contents
Fetching ...

Concentration bounds for intrinsic dimension estimation using Gaussian kernels

Martin Andersson

TL;DR

This work tackles reliable intrinsic-dimension estimation from finite samples by leveraging a local Gaussian kernel sum. It introduces a local estimator \hat{d}(x,t) and establishes finite-sample concentration and anti-concentration bounds with explicit dependence on sample size, bandwidth, and local geometry, complemented by a derivative-based bandwidth heuristic. The main contributions are rigorous finite-sample bounds for a Gaussian-kernel based estimator, a Berry-Esseen-based anti-concentration analysis, and a practical bandwidth selection method, all validated numerically on synthetic manifolds. The results offer a principled way to quantify uncertainty in dimension estimates and inform parameter choice in real-data scenarios, while outlining avenues for tighter bounds and extensions to broader kernels and non-integer dimensions.

Abstract

We prove finite-sample concentration and anti-concentration bounds for dimension estimation using Gaussian kernel sums. Our bounds provide explicit dependence on sample size, bandwidth, and local geometric and distributional parameters, characterizing precisely how regularity conditions govern statistical performance. We also propose a bandwidth selection heuristic using derivative information, which shows promise in numerical experiments.

Concentration bounds for intrinsic dimension estimation using Gaussian kernels

TL;DR

This work tackles reliable intrinsic-dimension estimation from finite samples by leveraging a local Gaussian kernel sum. It introduces a local estimator \hat{d}(x,t) and establishes finite-sample concentration and anti-concentration bounds with explicit dependence on sample size, bandwidth, and local geometry, complemented by a derivative-based bandwidth heuristic. The main contributions are rigorous finite-sample bounds for a Gaussian-kernel based estimator, a Berry-Esseen-based anti-concentration analysis, and a practical bandwidth selection method, all validated numerically on synthetic manifolds. The results offer a principled way to quantify uncertainty in dimension estimates and inform parameter choice in real-data scenarios, while outlining avenues for tighter bounds and extensions to broader kernels and non-integer dimensions.

Abstract

We prove finite-sample concentration and anti-concentration bounds for dimension estimation using Gaussian kernel sums. Our bounds provide explicit dependence on sample size, bandwidth, and local geometric and distributional parameters, characterizing precisely how regularity conditions govern statistical performance. We also propose a bandwidth selection heuristic using derivative information, which shows promise in numerical experiments.

Paper Structure

This paper contains 17 sections, 10 theorems, 84 equations, 8 figures, 1 algorithm.

Key Result

Theorem 1

Let $X_1,\dots,X_n$ be independent random variables with finite variance such that $|X_i- \mathbb{E}[X_i]| \leq b$ for some $b>0$ almost surely for all $i\leq n$. Let and $v=\sum_{i=1}^n \mathop{\mathrm{Var}}\nolimits[X_i]$. Then

Figures (8)

  • Figure 1: Same point set embedded as a 1D curve, 2D surface, and 3D volume, illustrating the need for regularity assumptions.
  • Figure 2: $(L,M,r)$-regular at $x$: $\Omega \cap B_r(x)$ projects orthogonally onto $U_x = x+T_x\Omega \cap B_r(x)$.
  • Figure 3: Scaling regimes of the kernel sum for unit ball in $\mathbb{R}^5$ with $n=100{,}000$ samples. Left: log-log plot showing linear regime with slope $d/2$. Right: derivative plateau indicating the linear regime.
  • Figure 4: Concentration of dimension estimation error for intrinsic dimension $d=3$. Panel (a) shows a uniform distribution on the unit ball in $\mathbb{R}^3$ (flat geometry) with bandwidth $t=0.000555 < t_0=0.000585$. Panel (b) shows an approximately uniform distribution on a spherical cap of a 3-sphere with radius $R=10$ and bandwidth $t=0.000768 < t_0=0.000808$. Solid lines represent empirical mean errors, dashed lines show theoretical concentration bounds at 90% confidence, and shaded regions indicate standard deviation.
  • Figure 5: Anti-concentration bounds on spherical cap. Experiments on a spherical cap of a 3-sphere with radius $R = 10$ (intrinsic dimension $d=3$, $L = 0.05$, $M = 0.005$) with bandwidth $t=0.000768 < t_0 = 0.000808$. The anti-concentration bounds show fundamental accuracy limits.
  • ...and 3 more figures

Theorems & Definitions (25)

  • Definition 3.1: Local $(L,M,r)$-regularity
  • Remark 3.2
  • Example 3.3: Circle of radius $R$
  • Remark 3.4
  • Theorem 1: Bernstein's inequality
  • Lemma 3.5: General Moment Bounds
  • Remark 3.6
  • Lemma 3.7: Multiplicative Bounds
  • Remark 3.8
  • Corollary 3.9: Properties of $W^+$ and $W^-$
  • ...and 15 more