Table of Contents
Fetching ...

The Flood Complex: Large-Scale Persistent Homology on Millions of Points

Florian Graf, Paolo Pellizzoni, Martin Uray, Stefan Huber, Roland Kwitt

TL;DR

The Flood complex tackles the core scalability barrier of persistent homology on large-scale point clouds by coupling a small landmark subset with a flooding filtration on the landmark Delaunay triangulation. The method provides theoretical guarantees about stability and approximation quality, while leveraging GPU acceleration to achieve orders-of-magnitude speedups over Alpha-based pipelines on millions of points. Empirical results on both synthetic and real 3D data demonstrate accurate topological summaries and improved downstream object classification in complex geometric settings. This approach broadens the practical applicability of PH in machine learning pipelines that require fast, scalable topological features for large, high-dimensional datasets.

Abstract

We consider the problem of computing persistent homology (PH) for large-scale Euclidean point cloud data, aimed at downstream machine learning tasks, where the exponential growth of the most widely-used Vietoris-Rips complex imposes serious computational limitations. Although more scalable alternatives such as the Alpha complex or sparse Rips approximations exist, they often still result in a prohibitively large number of simplices. This poses challenges in the complex construction and in the subsequent PH computation, prohibiting their use on large-scale point clouds. To mitigate these issues, we introduce the Flood complex, inspired by the advantages of the Alpha and Witness complex constructions. Informally, at a given filtration value $r\geq 0$, the Flood complex contains all simplices from a Delaunay triangulation of a small subset of the point cloud $X$ that are fully covered by balls of radius $r$ emanating from $X$, a process we call flooding. Our construction allows for efficient PH computation, possesses several desirable theoretical properties, and is amenable to GPU parallelization. Scaling experiments on 3D point cloud data show that we can compute PH of up to dimension 2 on several millions of points. Importantly, when evaluating object classification performance on real-world and synthetic data, we provide evidence that this scaling capability is needed, especially if objects are geometrically or topologically complex, yielding performance superior to other PH-based methods and neural networks for point cloud data. Source code and datasets are available on https://github.com/plus-rkwitt/flooder.

The Flood Complex: Large-Scale Persistent Homology on Millions of Points

TL;DR

The Flood complex tackles the core scalability barrier of persistent homology on large-scale point clouds by coupling a small landmark subset with a flooding filtration on the landmark Delaunay triangulation. The method provides theoretical guarantees about stability and approximation quality, while leveraging GPU acceleration to achieve orders-of-magnitude speedups over Alpha-based pipelines on millions of points. Empirical results on both synthetic and real 3D data demonstrate accurate topological summaries and improved downstream object classification in complex geometric settings. This approach broadens the practical applicability of PH in machine learning pipelines that require fast, scalable topological features for large, high-dimensional datasets.

Abstract

We consider the problem of computing persistent homology (PH) for large-scale Euclidean point cloud data, aimed at downstream machine learning tasks, where the exponential growth of the most widely-used Vietoris-Rips complex imposes serious computational limitations. Although more scalable alternatives such as the Alpha complex or sparse Rips approximations exist, they often still result in a prohibitively large number of simplices. This poses challenges in the complex construction and in the subsequent PH computation, prohibiting their use on large-scale point clouds. To mitigate these issues, we introduce the Flood complex, inspired by the advantages of the Alpha and Witness complex constructions. Informally, at a given filtration value , the Flood complex contains all simplices from a Delaunay triangulation of a small subset of the point cloud that are fully covered by balls of radius emanating from , a process we call flooding. Our construction allows for efficient PH computation, possesses several desirable theoretical properties, and is amenable to GPU parallelization. Scaling experiments on 3D point cloud data show that we can compute PH of up to dimension 2 on several millions of points. Importantly, when evaluating object classification performance on real-world and synthetic data, we provide evidence that this scaling capability is needed, especially if objects are geometrically or topologically complex, yielding performance superior to other PH-based methods and neural networks for point cloud data. Source code and datasets are available on https://github.com/plus-rkwitt/flooder.

Paper Structure

This paper contains 15 sections, 6 theorems, 7 equations, 5 figures, 3 tables.

Key Result

Theorem 2

The Flood complex is bottleneck stable with respect to its first argument, i.e., given $L,X,X'\subset \mathbb R^d$, it holds that $\forall i\in \mathbb N$

Figures (5)

  • Figure 1: Schematic overview of the Flood complex $\mathrm{Flood}_r(X,L)$ (top), the Alpha complex on a subsample $\mathrm{Alpha}_r(L)$ (bottom), and their accordance with the union of balls $X_r$ at different radii $r$. The point cloud $X$ is marked by $\bullet$, the landmarks $\subset X$ by $\textcolor{modernred}{\bullet}$, and identify the balls of radius $r$.
  • Figure 2: Exemplary hexbin plots of persistence diagrams of RV-A89 (left) and the Leptoseris paschalensis coral (right). Gray corresponds to Alpha of the full point cloud, blue to Flood PH with 10k landmarks and orange to Alpha$^\dagger$ PH with 75k points. Point clouds are visualized via small spheres colored by distance to their bounding box center (for RV-A89) or by elevation (for the Leptoseris paschalensis coral). Best viewed in color.
  • Figure 3: Approximation quality of Flood PH and Alpha PH on RV-A89. The (left) panel shows bottleneck distances to Alpha PH (full) in $H_0$, $H_1$ and $H_2$ when varying the number of landmarks for Flood PH and the subsample size for Alpha PH. The (middle) panel shows the Hausdorff distance between the full point cloud and the landmarks, resp., the points in the subsample. The (right) panel shows the color coding used in all the plots.
  • Figure 4: Runtime (in s) of Flood PH and Alpha PH for swisscheese-like point clouds: (a) in $\mathbb{R}^3$, varying the point cloud size $|X|$ with $|L|=1$k landmarks; (b) in $\mathbb{R}^3$, varying the number of landmarks $|L|$ with $|X|=1$M points; (c) varying the dimensionality with $|X|=1$M and $|L|=1$k.
  • Figure 5: Comparison of classification of accuracy (on swisscheese) and runtime (in s) between Flood PH (2k landmarks) and Alpha PH when the latter has access to an increasing number of points in $X$. The leftmost panel shows an example of a swisscheese point cloud with 10 holes.

Theorems & Definitions (7)

  • Definition 1
  • Theorem 2
  • Theorem 3
  • Corollary 4
  • Theorem 5
  • Lemma 6
  • Lemma 7