Concentration bounds for intrinsic dimension estimation using Gaussian kernels
Martin Andersson
TL;DR
This work tackles reliable intrinsic-dimension estimation from finite samples by leveraging a local Gaussian kernel sum. It introduces a local estimator \hat{d}(x,t) and establishes finite-sample concentration and anti-concentration bounds with explicit dependence on sample size, bandwidth, and local geometry, complemented by a derivative-based bandwidth heuristic. The main contributions are rigorous finite-sample bounds for a Gaussian-kernel based estimator, a Berry-Esseen-based anti-concentration analysis, and a practical bandwidth selection method, all validated numerically on synthetic manifolds. The results offer a principled way to quantify uncertainty in dimension estimates and inform parameter choice in real-data scenarios, while outlining avenues for tighter bounds and extensions to broader kernels and non-integer dimensions.
Abstract
We prove finite-sample concentration and anti-concentration bounds for dimension estimation using Gaussian kernel sums. Our bounds provide explicit dependence on sample size, bandwidth, and local geometric and distributional parameters, characterizing precisely how regularity conditions govern statistical performance. We also propose a bandwidth selection heuristic using derivative information, which shows promise in numerical experiments.
