How high is `high'? Rethinking the roles of dimensionality in topological data analysis and manifold learning

Hannah Sansford; Nick Whiteley; Patrick Rubin-Delanchy

How high is `high'? Rethinking the roles of dimensionality in topological data analysis and manifold learning

Hannah Sansford, Nick Whiteley, Patrick Rubin-Delanchy

TL;DR

The paper tackles how to reconcile practical data geometry with statistical theory by introducing a generalised Hanson-Wright inequality and three dimensionality notions: ambient intrinsic dimension $p_{\mathrm{int}}$, correlation rank $r$, and latent intrinsic dimension $d$. It develops a random function model that links observed point-clouds $\mathcal{Y}_n$ to latent manifolds $\mathcal{M}$ via Mercer kernels, and establishes persistence-diagram consistency without requiring $p\gg n$. The authors also provide practical isometry diagnostics between latent space $\mathcal{Z}$ and observed geometry $\mathcal{M}$, and demonstrate evidence that grid-cell activity encodes a geometrically faithful map of physical space, with a toroidal structure isometric to the world under an appropriate model (Model 3). Overall, the work broadens the understanding of high-dimensional data geometry, showing that latent topology and manifold structure can emerge under mild growth of ambient-derived dimensions and that isometric relations between observed neural activity and real space can be detected in practice.

Abstract

We present a generalised Hanson-Wright inequality and use it to establish new statistical insights into the geometry of data point-clouds. In the setting of a general random function model of data, we clarify the roles played by three notions of dimensionality: ambient intrinsic dimension $p_{\mathrm{int}}$, which measures total variability across orthogonal feature directions; correlation rank, which measures functional complexity across samples; and latent intrinsic dimension, which is the dimension of manifold structure hidden in data. Our analysis shows that in order for persistence diagrams to reveal latent homology and for manifold structure to emerge it is sufficient that $p_{\mathrm{int}}\gg \log n$, where $n$ is the sample size. Informed by these theoretical perspectives, we revisit the ground-breaking neuroscience discovery of toroidal structure in grid-cell activity made by Gardner et al. (Nature, 2022): our findings reveal, for the first time, evidence that this structure is in fact isometric to physical space, meaning that grid cell activity conveys a geometrically faithful representation of the real world.

How high is `high'? Rethinking the roles of dimensionality in topological data analysis and manifold learning

TL;DR

The paper tackles how to reconcile practical data geometry with statistical theory by introducing a generalised Hanson-Wright inequality and three dimensionality notions: ambient intrinsic dimension

, correlation rank

, and latent intrinsic dimension

. It develops a random function model that links observed point-clouds

to latent manifolds

via Mercer kernels, and establishes persistence-diagram consistency without requiring

. The authors also provide practical isometry diagnostics between latent space

and observed geometry

, and demonstrate evidence that grid-cell activity encodes a geometrically faithful map of physical space, with a toroidal structure isometric to the world under an appropriate model (Model 3). Overall, the work broadens the understanding of high-dimensional data geometry, showing that latent topology and manifold structure can emerge under mild growth of ambient-derived dimensions and that isometric relations between observed neural activity and real space can be detected in practice.

Abstract

, which measures total variability across orthogonal feature directions; correlation rank, which measures functional complexity across samples; and latent intrinsic dimension, which is the dimension of manifold structure hidden in data. Our analysis shows that in order for persistence diagrams to reveal latent homology and for manifold structure to emerge it is sufficient that

, where

is the sample size. Informed by these theoretical perspectives, we revisit the ground-breaking neuroscience discovery of toroidal structure in grid-cell activity made by Gardner et al. (Nature, 2022): our findings reveal, for the first time, evidence that this structure is in fact isometric to physical space, meaning that grid cell activity conveys a geometrically faithful representation of the real world.

How high is `high'? Rethinking the roles of dimensionality in topological data analysis and manifold learning

TL;DR

Abstract

How high is `high'? Rethinking the roles of dimensionality in topological data analysis and manifold learning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (17)