Table of Contents
Fetching ...

Understanding the Topology and the Geometry of the Space of Persistence Diagrams via Optimal Partial Transport

Vincent Divol, Théo Lacombe

TL;DR

This article introduces a generalization of persistence diagrams, namely Radon measures supported on the upper half plane, and explores topological properties of this new space, which will also hold for the closed subspace of persistence diagram.

Abstract

Despite the obvious similarities between the metrics used in topological data analysis and those of optimal transport, an optimal-transport based formalism to study persistence diagrams and similar topological descriptors has yet to come. In this article, by considering the space of persistence diagrams as a space of discrete measures, and by observing that its metrics can be expressed as optimal partial transport problems, we introduce a generalization of persistence diagrams, namely Radon measures supported on the upper half plane. Such measures naturally appear in topological data analysis when considering continuous representations of persistence diagrams (e.g.\ persistence surfaces) but also as limits for laws of large numbers on persistence diagrams or as expectations of probability distributions on the persistence diagrams space. We explore topological properties of this new space, which will also hold for the closed subspace of persistence diagrams. New results include a characterization of convergence with respect to Wasserstein metrics, a geometric description of barycenters (Fréchet means) for any distribution of diagrams, and an exhaustive description of continuous linear representations of persistence diagrams. We also showcase the strength of this framework to study random persistence diagrams by providing several statistical results made meaningful thanks to this new formalism.

Understanding the Topology and the Geometry of the Space of Persistence Diagrams via Optimal Partial Transport

TL;DR

This article introduces a generalization of persistence diagrams, namely Radon measures supported on the upper half plane, and explores topological properties of this new space, which will also hold for the closed subspace of persistence diagram.

Abstract

Despite the obvious similarities between the metrics used in topological data analysis and those of optimal transport, an optimal-transport based formalism to study persistence diagrams and similar topological descriptors has yet to come. In this article, by considering the space of persistence diagrams as a space of discrete measures, and by observing that its metrics can be expressed as optimal partial transport problems, we introduce a generalization of persistence diagrams, namely Radon measures supported on the upper half plane. Such measures naturally appear in topological data analysis when considering continuous representations of persistence diagrams (e.g.\ persistence surfaces) but also as limits for laws of large numbers on persistence diagrams or as expectations of probability distributions on the persistence diagrams space. We explore topological properties of this new space, which will also hold for the closed subspace of persistence diagrams. New results include a characterization of convergence with respect to Wasserstein metrics, a geometric description of barycenters (Fréchet means) for any distribution of diagrams, and an exhaustive description of continuous linear representations of persistence diagrams. We also showcase the strength of this framework to study random persistence diagrams by providing several statistical results made meaningful thanks to this new formalism.

Paper Structure

This paper contains 25 sections, 39 theorems, 114 equations, 7 figures.

Key Result

Proposition 3.1

Let $\mu, \nu \in \mathcal{M}$. The set of transport plans $\mathrm{Adm}(\mu,\nu)$ is sequentially compact for the VM topology on $E_\Omega \vcentcolon= \overline{\Omega} \times \overline{\Omega} \backslash {\partial \Omega} \times {\partial \Omega}$. Moreover, if $\mu,\nu \in \mathcal{M}^p$, for th Moreover, $\mathrm{OT}_p$ is a metric on $\mathcal{M}^p$.

Figures (7)

  • Figure 1: An example of optimal partial matching between two diagrams. The bottleneck distance between these two diagrams is the length of the longest edge in this matching, while their Wasserstein distance $d_p$ is the $p$-th root of the sum of all edge lengths to the power $p$.
  • Figure 2: Some common linear representations of persistence diagrams. From left to right: A persistence diagram. Its persistence surface tda:adams2017persistenceImages, which is a persistence measure. The corresponding persistence silhouette tda:chazal2014stochastic. The corresponding Betti Curve tda:umeda2017time. See Section \ref{['subsec:continuity_of_representations']} for details.
  • Figure 3: A transport map $f$ must satisfy that the mass $\nu(B)$ (light blue) is the sum of the mass $\mu(f^{-1}(B) \cap \mathcal{X})$ given by $\mu$ that is transported by $f$ onto $B$ (light red) and the mass $\nu(B \cap f(\partial \mathcal{X}))$ coming from $\partial \mathcal{X}$ and transported by $f$ onto $B$.
  • Figure 4: Illustration of differences between $\mathrm{OT}_p$, $\mathrm{OT}_\infty$, and vague convergences. Blue color represents the mass on a point while red color designates distances. $(a)$ A case where $\mathrm{OT}_p(\mu_n, 0) \to 0$ for any $p < \infty$ while $\mathrm{OT}_\infty(\mu_n, 0) = 1$. $(b)$ A case where $\mathrm{OT}_\infty(\mu_n, 0) \to 0$ while for all $p < \infty$, $\mathrm{OT}_p(\mu_n, \mu) \to \infty$. $(c)$ A sequence of persistence diagrams $a_n \in \mathcal{D}^\infty$, where $(a_n)_n$ converges vaguely to $a = \sum_{i} \delta_{x_i}$ and $\mathrm{Pers}_\infty(a_n)=\mathrm{Pers}_\infty(a)$, but $(a_n)$ does not converge to $a$ for $\mathrm{OT}_\infty$.
  • Figure 5: Global picture of the proof. The main idea is to observe that the cost induced by $\pi_i$ (red) is strictly greater than the sum of costs induces by the $\pi_i'$s (blue), which leads to a strictly better energy.
  • ...and 2 more figures

Theorems & Definitions (92)

  • Remark 1.1
  • Remark 1.2
  • Definition 2.1
  • Definition 2.2
  • Remark 2.1
  • Remark 3.1
  • Proposition 3.1
  • Remark 3.2
  • Lemma 3.1
  • proof
  • ...and 82 more