Computing Diffusion Geometry

Iolo Jones; David Lanners

Computing Diffusion Geometry

Iolo Jones, David Lanners

TL;DR

This work introduces a data-driven framework for diffusion geometry that rewrites calculus and geometry in terms of diffusion processes, enabling computation of gradients, differential forms, tensors, and topological invariants directly from point clouds. It relies on the carré du champ operator, computed from a data-driven Markov chain and heat kernel, to define a robust, manifold-free Riemannian-like structure and to formulate differential operators via weak formulations. The framework yields scalable methods for gradient fields, Hessians, Hodge Laplacians, geodesic distances, curvature, de Rham cohomology, circular coordinates, and Morse theory, with a focus on stability, regularisation, and convergence in noisy, high-dimensional data. Significantly, it outperforms traditional persistent homology in several tasks, provides interpretable canonical harmonic forms for topology, and includes a Python package for practical use on real data.

Abstract

Calculus and geometry are ubiquitous in the theoretical modelling of scientific phenomena, but have historically been very challenging to apply directly to real data as statistics. Diffusion geometry is a new theory that reformulates classical calculus and geometry in terms of a diffusion process, allowing these theories to generalise beyond manifolds and be computed from data. This work introduces a new computational framework for diffusion geometry that substantially broadens its practical scope and improves its precision, robustness to noise, and computational complexity. We present a range of new computational methods, including all the standard objects from vector calculus and Riemannian geometry, and apply them to solve spatial PDEs and vector field flows, find geodesic (intrinsic) distances, curvature, and several new topological tools like de Rham cohomology, circular coordinates, and Morse theory. These methods are data-driven, scalable, and can exploit highly optimised numerical tools for linear algebra.

Computing Diffusion Geometry

TL;DR

Abstract

Paper Structure (96 sections, 14 theorems, 205 equations, 40 figures, 3 tables)

This paper contains 96 sections, 14 theorems, 205 equations, 40 figures, 3 tables.

Introduction
The problem of data-driven calculus
Why the manifold hypothesis has the wrong name
Paper outline
Overview and main concepts
Vector calculus on $\mathbb{R}^n$ and manifolds
Diffusion geometry on $\mathbb{R}^n$ and manifolds
Computing the carré du champ
Gradient vector fields
General recipe for computing diffusion geometry
Functions, vector fields, forms, and tensors
Carré du champ and measure from a Markov chain
Constructing a Markov chain from a kernel
Measure
Carré du champ
...and 81 more sections

Key Result

Proposition 0

If $f, h : \mathbb{R}^d \to \mathbb{R}$ are differentiable functions then

Figures (40)

Figure 1: A function $f$ represents a signal on the point cloud (blue denotes negative values, and red positive). We use diffusion geometry to compute its gradient vector field $\nabla f$ with respect to the data geometry.
Figure 2: The height function $h$ measures elevation on the torus. It has a maximum at the top, a minimum at the bottom, and two saddle points in between. The gradient $\nabla h$ reveals the flow into and out of these critical points. We compute the Hessian $H(h)$, which measures the expansion and contraction of $h$ in a $2 \times 2$ matrix at each point. We plot its eigenvectors, which indicate the directions of expansion, coloured by their eigenvalues, which measure the rate of expansion (so negative implies contraction). These tell us the indices of the critical points: there is expansion in both directions at a minimum (degree 0), contraction in both directions at a maximum (degree 2), and both expansion and contraction at a saddle (degree 1).
Figure 3: Vector fields on $\mathbb{R}^2$ and a manifold $\mathcal{M}$. In both cases, we construct a vector field $X$ by multiplying $\nabla x$ and $\nabla y$ by coefficient functions $f$ and $h$. The functions and vector fields on the right are the pullbacks of those on the left from $\mathbb{R}^2$ to the manifold $\mathcal{M}$ (i.e. we restrict functions to $\mathcal{M}$ and project vectors to the tangent spaces $T_p\mathcal{M}$).
Figure 4: Gradients of functions on manifolds and non-manifolds. The top row shows a function $f$ on different spaces, and the bottom row shows its gradient $\nabla f$. The first column is in $\mathbb{R}^2$ and the second column is a 1-dimensional manifold$\mathcal{M}$, where we can compute $\nabla f$ from $f$ exactly. The third column uses diffusion geometry to estimate $\nabla f$ on $\mathcal{M}$ from a sample of data, using the carré du champ formula (\ref{['eq: cdc covariance computation intro']}). The fourth column shows non-manifold data, where we can still use diffusion geometry to compute a sensible notion of gradient in this more general setting. The data is not from a manifold, because it is both 1d and 2d in different regions (2d on the patch on the left, and 1d elsewhere), and has a branching point on the right where three 1-dimensional paths meet.
Figure 5: Spanning sets for functions and vector fields in 2d. We can use eigenfunctions $\phi_i$ of the Markov chain as the smoothest possible function basis (left column). The two ambient coordinates $x$ and $y$ are an embedding of the data, so we can use their gradients $\nabla x$ and $\nabla y$ to construct a spanning set for the space of vector fields by multiplying them by basis functions (middle and right columns).
...and 35 more figures

Theorems & Definitions (35)

Proposition 0
Definition 4.1
Definition 4.2
Definition 4.3
Definition 4.4
Example 4.5
Lemma 4.6
proof
Corollary 4.7
proof
...and 25 more

Computing Diffusion Geometry

TL;DR

Abstract

Computing Diffusion Geometry

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (40)

Theorems & Definitions (35)