Table of Contents
Fetching ...

Curvature of high-dimensional data

Jiayi Chen, Mohammad Javad Latifi Jebelli, Daniel N. Rockmore

TL;DR

This work tackles the challenge of estimating local curvature from high-dimensional, noisy data by formulating a probabilistic decoding framework that treats naive curvature estimates as random variables whose distributions depend on noise and sampling density. The authors derive explicit pushforward formulas for the absolute variation curvature under von Mises–Fisher noise, extend these results to mixtures, and show how to decode the true curvature via maximum likelihood. Numerical experiments on spheres up to dimension $m=12$ demonstrate that naive curvature estimates are biased and that the decoded curvature closely tracks the true curvature, effectively mitigating high-dimensional bias. The approach provides a principled pathway to reliable extrinsic curvature estimation in high-dimensional data and offers a general methodology applicable to other curvature-like quantities arising from local tangent-space estimates.

Abstract

We consider the problem of estimating curvature where the data can be viewed as a noisy sample from an underlying manifold. For manifolds of dimension greater than one there are multiple definitions of local curvature, each suggesting a different estimation process for a given data set. Recently, there has been progress in proving that estimates of ``local point cloud curvature" converge to the related smooth notion of local curvature as the density of the point cloud approaches infinity. Herein we investigate practical limitations of such convergence theorems and discuss the significant impact of bias in such estimates as reported in recent literature. We provide theoretical arguments for the fact that bias increases drastically in higher dimensions, so much so that in high dimensions, the probability that a naive curvature estimate lies in a small interval near the true curvature could be near zero. We present a probabilistic framework that enables the construction of more accurate estimators of curvature for arbitrary noise models. The efficacy of our technique is supported with experiments on spheres of dimension as large as twelve.

Curvature of high-dimensional data

TL;DR

This work tackles the challenge of estimating local curvature from high-dimensional, noisy data by formulating a probabilistic decoding framework that treats naive curvature estimates as random variables whose distributions depend on noise and sampling density. The authors derive explicit pushforward formulas for the absolute variation curvature under von Mises–Fisher noise, extend these results to mixtures, and show how to decode the true curvature via maximum likelihood. Numerical experiments on spheres up to dimension demonstrate that naive curvature estimates are biased and that the decoded curvature closely tracks the true curvature, effectively mitigating high-dimensional bias. The approach provides a principled pathway to reliable extrinsic curvature estimation in high-dimensional data and offers a general methodology applicable to other curvature-like quantities arising from local tangent-space estimates.

Abstract

We consider the problem of estimating curvature where the data can be viewed as a noisy sample from an underlying manifold. For manifolds of dimension greater than one there are multiple definitions of local curvature, each suggesting a different estimation process for a given data set. Recently, there has been progress in proving that estimates of ``local point cloud curvature" converge to the related smooth notion of local curvature as the density of the point cloud approaches infinity. Herein we investigate practical limitations of such convergence theorems and discuss the significant impact of bias in such estimates as reported in recent literature. We provide theoretical arguments for the fact that bias increases drastically in higher dimensions, so much so that in high dimensions, the probability that a naive curvature estimate lies in a small interval near the true curvature could be near zero. We present a probabilistic framework that enables the construction of more accurate estimators of curvature for arbitrary noise models. The efficacy of our technique is supported with experiments on spheres of dimension as large as twelve.

Paper Structure

This paper contains 12 sections, 5 theorems, 32 equations, 7 figures, 1 table, 3 algorithms.

Key Result

Theorem 1

(Noise pushforward for angle computation) Let $\mu_0,\mu \in S^m$, and let $\alpha$ be the angle between $\mu_0$ and $\mu$. Let $X$ be a $S^m$-valued random variable distributed according to the von Mises-Fisher distribution with (Frechet) mean of $\mu$ and concentration parameter $\kappa>0$. Then, where $\theta \in [0,\pi]$ and $I_k$ is the modified Bessel function of the first kind.

Figures (7)

  • Figure 1: Visualizations of noisy manifolds.
  • Figure 2: Histograms of curvature estimates computed from 50,000 random samples of points on spheres $S^3$, $S^5$, and $S^{10}$ with radius 1, using parameters shown in Table S1 (in the SI). The plots illustrate how bias in the naive calculation of absolute variation curvature increases with dimension. The dash line represents the true value of curvature for all cases.
  • Figure 3: Theoretical probability density of naive curvature for spheres of dimension $3,5,10,20,50$. This confirms the empirical observations in Figure \ref{['fig:naive_bias']}.
  • Figure 4: Decoded Curvature (red) vs Naive Curvature (blue) for noisy point clouds near $S^m \subset \mathbb{R}^{m+1}$. Rows correspond to dimensions $m=3,5,10,12$, respectively; columns correspond to the noise levels $\sigma = 0.01,0.02,0.05$, respectively. The radius is $r=2$ in all cases. Generally, the decoded curvature is peaked around or very near the true curvature value while the naive estimates get progressively worse with increased dimension or increased noise.
  • Figure 5: Left: decoded curvature estimates (red) and naive curvature estimates (blue) for $S^{12}$ and radius $2$, with noise levels $\sigma = 0.02, 0.05$. Right: distribution of tangent space approximation errors, with empirical histogram (blue) and the MLE fit (red line) from a mixture of vMF components (dashed lines). We see how in both of these high-dimensional cases the decoded estimate is strongly peaked around the correct curvature ($0.5$ for a sphere of radius $2$). The fits illustrate how the tangent space error distribution fit affects the bias in our decoded curvature estimation.
  • ...and 2 more figures

Theorems & Definitions (11)

  • Definition 1
  • Definition 2
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • Theorem 4
  • Lemma 1
  • ...and 1 more