Density Estimation on Rectifiable Sets
Jack Kendrick
TL;DR
This work extends kernel density estimation to data supported on $d$-rectifiable sets, addressing the slow convergence of classical KDE in high ambient dimensions by using a density estimator ${\hat{p}}_n(x) = \frac{1}{h_n^d}\sum_{i=1}^n K(\|X_i-x\|/h_n)$ tailored to the intrinsic dimension $d$. Under an approximate-tangent-space condition with parameter $m>0$, the estimator's mean-squared error decays as ${\rm MSE}[{\hat{p}}_n(x)] = O\left(\frac{1}{n^{2m/(d+2m)}}\right)$, recovering known results on manifolds and extending to algebraic and semi-algebraic sets. When the support is locally a smooth manifold almost everywhere, and with sufficient smoothness of $p$ and $K$, the method achieves the classical rate ${\rm MSE} = O\left(\frac{1}{n^{4/(d+4)}}\right)$ with $h_n \asymp n^{-1/(d+4)}$, reflecting improved tangent-space approximations. A numerical example on $d$-sparse data demonstrates that the convergence rate depends on the intrinsic dimension $d$ but not on the ambient dimension $D$, illustrating practical applicability to high-dimensional, low-dimensional-structure data such as sparsity and low-rank models.
Abstract
Kernel density estimation is a popular method for estimating unseen probability distributions. However, the convergence of these classical estimators to the true density slows down in high dimensions. Moreover, they do not define meaningful probability distributions when the intrinsic dimension of data is much smaller than its ambient dimension. We build on previous work on density estimation on manifolds to show that a modified kernel density estimator converges to the true density on $d-$rectifiable sets. As a special case, we consider algebraic varieties and semi-algebraic sets and prove a convergence rate in this setting. We conclude the paper with a numerical experiment illustrating the convergence of this estimator on sparse data.
