Table of Contents
Fetching ...

Estimating the persistent homology of $\mathbb{R}^n$-valued functions using function-geometric multifiltrations

Ethan André, Jingyi Li, David Loiseaux, Steve Oudot

TL;DR

This work extends the scalar persistent-homology estimation framework to vector-valued functions by developing function-geometric multifiltrations and proving that the main estimator $H_*(\mathcal{R}^{\delta\to 2\delta}(\mathscr{f}|_P))$ remains an $\omega(2\delta)$-approximation of the target $H_*(\mathscr{f})$ under regularity assumptions. It introduces a fixed-radius, a varying-radius, and a Kan-extension-based multi-parameter approach to estimate the full $n$-parameter persistence of vector-valued functions, with robust noise tolerances and statistical convergence guarantees. The authors provide an algorithm to compute presentations of the image of morphisms between persistence modules, enabling practical computation of the estimators and their invariants (e.g., multigraded Betti numbers), and implement it in the multipers library. They establish consistency and (quasi-)minimax convergence rates under standard sampling models, including both known and unknown regularity of the sampling measure, and demonstrate the methods on synthetic and real biological data. The results offer a principled, scalable path for reliable multi-parameter persistent-homology estimation in high-dimensional, noisy settings, with direct applicability to biological and geometric data analysis.

Abstract

Given an unknown $\mathbb{R}^n$-valued function $f$ on a metric space $X$, can we approximate the persistent homology of $f$ from a finite sampling of $X$ with known pairwise distances and function values? This question has been answered in the case $n=1$, assuming $f$ is Lipschitz continuous and $X$ is a sufficiently regular geodesic metric space, and using filtered geometric complexes with fixed scale parameter for the approximation. In this paper we answer the question for arbitrary $n$, under similar assumptions and using function-geometric multifiltrations. Our analysis offers a different view on these multifiltrations by focusing on their approximation properties rather than on their stability properties. We also leverage the multiparameter setting to provide insight into the influence of the scale parameter, whose choice is central to this type of approach. From a practical standpoint, we show that our approximation results are robust to input noise, and that function-geometric multifiltrations have good statistical convergence properties. We also provide an algorithm to compute our estimators, and we use its implementation to conduct extensive experiments, on both synthetic and real biological data, in order to validate our theoretical results and to assess the practicality of our approach.

Estimating the persistent homology of $\mathbb{R}^n$-valued functions using function-geometric multifiltrations

TL;DR

This work extends the scalar persistent-homology estimation framework to vector-valued functions by developing function-geometric multifiltrations and proving that the main estimator remains an -approximation of the target under regularity assumptions. It introduces a fixed-radius, a varying-radius, and a Kan-extension-based multi-parameter approach to estimate the full -parameter persistence of vector-valued functions, with robust noise tolerances and statistical convergence guarantees. The authors provide an algorithm to compute presentations of the image of morphisms between persistence modules, enabling practical computation of the estimators and their invariants (e.g., multigraded Betti numbers), and implement it in the multipers library. They establish consistency and (quasi-)minimax convergence rates under standard sampling models, including both known and unknown regularity of the sampling measure, and demonstrate the methods on synthetic and real biological data. The results offer a principled, scalable path for reliable multi-parameter persistent-homology estimation in high-dimensional, noisy settings, with direct applicability to biological and geometric data analysis.

Abstract

Given an unknown -valued function on a metric space , can we approximate the persistent homology of from a finite sampling of with known pairwise distances and function values? This question has been answered in the case , assuming is Lipschitz continuous and is a sufficiently regular geodesic metric space, and using filtered geometric complexes with fixed scale parameter for the approximation. In this paper we answer the question for arbitrary , under similar assumptions and using function-geometric multifiltrations. Our analysis offers a different view on these multifiltrations by focusing on their approximation properties rather than on their stability properties. We also leverage the multiparameter setting to provide insight into the influence of the scale parameter, whose choice is central to this type of approach. From a practical standpoint, we show that our approximation results are robust to input noise, and that function-geometric multifiltrations have good statistical convergence properties. We also provide an algorithm to compute our estimators, and we use its implementation to conduct extensive experiments, on both synthetic and real biological data, in order to validate our theoretical results and to assess the practicality of our approach.

Paper Structure

This paper contains 29 sections, 29 theorems, 106 equations, 15 figures, 1 algorithm.

Key Result

Lemma 2.1

Let $P\subseteq P'$ be finite sets of points in a geodesic metric space $(X, d_X)$. For any $\delta \leq \delta' < \varrho_{ X}$, the following square commutes, where the isomorphisms are the ones provided by the Nerve Lemma hatcher, and where the other two arrows are induced in homology by inclusi

Figures (15)

  • Figure 1: Left: an illustration of the left Kan extension $\mathrm{Lan}_{\iota}H_*( \mathscr{f}):\mathbb{R}_{\geq 0}\times\mathbb{R}^n\to\mathrm{\bf{vec}}$, with identity maps horizontally and the structural morphisms of ${H_*(\mathscr{f})}$ vertically. Right: (case $n=1$) the left Kan extension of the interval module $\mathbbm{k}^{[1,2]}$ is the interval module $\mathbbm{k}^{\mathbb{R}_{\geq 0}\times[1,2]}$ (in red).
  • Figure 2: Left: the vertical height function on a sampled unit circle in the plane. Distances on the circle are given by arclength. Center: $H_1(\mathscr{f})$ has a single interval summand (in magenta), starting at height $1$, which extends to the free module $\mathrm{Lan}_\iota H_1(\mathscr{f})$ generated at $(0,1)$ in $\mathbb{R}_{\geq 0}\times\mathbb{R}$. The modules $H_1(\mathcal{R}^{\bullet}(\mathscr{f}|_P))$ (in yellow) and $H_1(\mathcal{R}^{2\bullet}(\mathscr{f}|_P))$ (in green) are interval modules in this simple scenario. Right: the estimator $H_1(\mathcal{R}^{\bullet\to 2\bullet}(\mathscr{f}|_P))$ (in blue), which is also an interval module, approximates the target $\mathrm{Lan}_\iota H_1(\mathscr{f})$ in the vertical interleaving distance within any slab $[2\varepsilon, \delta_0]\times \mathbb{R}$ with $\delta_0<\varrho_{ X}/2$, as per \ref{['thm:estimator_varying_radius']}. In turn, the vertical interleaving between the two modules implies a vertical matching between their multigraded Betti numbers within the slab (illustrated by green arrows in the close-up view), as per \ref{['cor:stab_inv_rprn']}.
  • Figure 3: Contrasting Theorem \ref{['thm:estimator_varying_radius']} with Theorem \ref{['thm:estimator_fixed_radius']}. (a): the input is $P=P_1\sqcup P_2$, where $P_1$ and $P_2$ are two point clouds regularly sampled from two disjoint squares $X_1, X_2$ in the plane. Distances within each square are shortest-path distances along the boundary, while distances between squares are infinite. Here $\mathscr{f}$ is the vertical height function, and its persistent homology in degree $1$ is considered. (b): the estimator $H_*(\mathcal{R}^{\bullet\to 2\bullet}(\mathscr{f}|_P))$ is an interval-decomposable module whose summands have half-open rectangle supports, respectively $R_1$ (in yellow) and $R_2$ (in red). The scalings of $X_1, X_2$, and their respective sampling densities, have been adjusted so that $R_1\cap R_2=\emptyset$ while $R_1\cup R_2$ is a single rectangle $R'$ shown in subfigure (d). In turn, the interval module with support $R'$ can be realized as the degree $1$ persistent homology of another sample $P'$ from a square in the plane, shown in subfigure (c). \ref{['thm:estimator_varying_radius']} guarantees that $H_1(\mathcal{R}^{\bullet\to 2\bullet}(\mathscr{f}|_P))$ is interleaved with $\mathrm{Lan}_\iota H_1(\mathscr{f}|_{X_1})$ over $R_1$ and with $\mathrm{Lan}_\iota H_1(\mathscr{f}|_{X_2})$ over $R_2$, leading to the two summands in (b). By contrast, \ref{['thm:estimator_fixed_radius']} only guarantees interleavings within the vertical slices, which is not sufficient to discriminate the module with two summands in (b) from the module with a single summand in (d).
  • Figure 5: (a) A sample $P$ from a space $X=X_1\sqcup X_2$ composed of two circles equipped with geodesic distances, each uniformly sampled with distinct radii and concentration levels. The target function $\mathscr{f}\colon X \to \mathbb{R}$ is the height function on the two circles. The barcode of the target $H_1(\mathscr{f})$ consists of two infinite bars, each originating from a level marked as a dashed line. (b) Bottleneck distance between the barcode of the estimator $H_1 \left( \mathcal{R}^{\delta\to 2\delta}(\mathscr{f}|_P) \right)$ and that of the target $H_1 \left( \mathscr{f} \right)$ as a function of $\delta$. Here, $\varepsilon_1$ (resp. $\varepsilon_2$) is the sampling error of $X_1$ (resp. $X_2$), and $\varrho_{X_1}$ (resp. $\varrho_{X_2}$) the convexity radius of $X_1$ (resp. $X_2$). All infinite bars are truncated at 10 to ensure the bottleneck distances are finite. (c) Visualization of the estimator $H_1 \left( \mathcal{R}^{\bullet\to 2\bullet}(\mathscr{f}|_P) \right)$ computed using MMA. (d) Visualization of the estimator $H_1 \left( \mathcal{C}^{\bullet}(\mathscr{f}|_P) \right)$ computed using MMA. Each colored region represents a persistent topological feature. The dashed lines indicate the birth times of the bars in the barcode of the target $H_1(\mathscr{f})$.
  • Figure 6: A noisy analog of \ref{['fig:expe:three_annulus_dataset']}.
  • ...and 10 more figures

Theorems & Definitions (62)

  • Lemma 2.1: chazal2011scalar
  • Remark 2.2
  • Theorem 2.3: oudot2024stability
  • Theorem 2.4
  • Definition 2.5: $(a,b)$-standard measure
  • Definition 2.6: Convergence rates
  • Definition 2.7: Estimator
  • Definition 2.8: Minimax convergence rates
  • Remark 3.1
  • Theorem 3.2
  • ...and 52 more