Curvature of high-dimensional data

Jiayi Chen; Mohammad Javad Latifi Jebelli; Daniel N. Rockmore

Curvature of high-dimensional data

Jiayi Chen, Mohammad Javad Latifi Jebelli, Daniel N. Rockmore

TL;DR

This work tackles the challenge of estimating local curvature from high-dimensional, noisy data by formulating a probabilistic decoding framework that treats naive curvature estimates as random variables whose distributions depend on noise and sampling density. The authors derive explicit pushforward formulas for the absolute variation curvature under von Mises–Fisher noise, extend these results to mixtures, and show how to decode the true curvature via maximum likelihood. Numerical experiments on spheres up to dimension $m=12$ demonstrate that naive curvature estimates are biased and that the decoded curvature closely tracks the true curvature, effectively mitigating high-dimensional bias. The approach provides a principled pathway to reliable extrinsic curvature estimation in high-dimensional data and offers a general methodology applicable to other curvature-like quantities arising from local tangent-space estimates.

Abstract

We consider the problem of estimating curvature where the data can be viewed as a noisy sample from an underlying manifold. For manifolds of dimension greater than one there are multiple definitions of local curvature, each suggesting a different estimation process for a given data set. Recently, there has been progress in proving that estimates of ``local point cloud curvature" converge to the related smooth notion of local curvature as the density of the point cloud approaches infinity. Herein we investigate practical limitations of such convergence theorems and discuss the significant impact of bias in such estimates as reported in recent literature. We provide theoretical arguments for the fact that bias increases drastically in higher dimensions, so much so that in high dimensions, the probability that a naive curvature estimate lies in a small interval near the true curvature could be near zero. We present a probabilistic framework that enables the construction of more accurate estimators of curvature for arbitrary noise models. The efficacy of our technique is supported with experiments on spheres of dimension as large as twelve.

Curvature of high-dimensional data

TL;DR

Abstract

Curvature of high-dimensional data

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (11)