Table of Contents
Fetching ...

Robust Barycenters of Persistence Diagrams

Keanu Sisouk, Eloi Tanguy, Julie Delon, Julien Tierny

TL;DR

This work addresses the sensitivity of persistence-diagram barycenters to outliers by generalizing Wasserstein barycenters to generic transport costs with $q>1$. It adapts a fixed-point method to compute robust barycenters via a two-step optimization: assignment and a gradient-based ground-barycenter update using $\mathfrak{b}_q$, ensuring nonincreasing Fréchet energy under suitable conditions. The framework is validated on clustering and Wasserstein dictionary encoding tasks, with empirical evidence that lower values of $q$ (e.g., $q$ in $[1.2,1.4]$) yield stronger robustness to outliers while maintaining representational quality. The authors provide a PyTorch-enabled implementation and demonstrate practical benefits for real-life ensembles of persistence diagrams, highlighting potential extensions to other topological descriptors.

Abstract

This short paper presents a general approach for computing robust Wasserstein barycenters of persistence diagrams. The classical method consists in computing assignment arithmetic means after finding the optimal transport plans between the barycenter and the persistence diagrams. However, this procedure only works for the transportation cost related to the $q$-Wasserstein distance $W_q$ when $q=2$. We adapt an alternative fixed-point method to compute a barycenter diagram for generic transportation costs ($q > 1$), in particular those robust to outliers, $q \in (1,2)$. We show the utility of our work in two applications: \emph{(i)} the clustering of persistence diagrams on their metric space and \emph{(ii)} the dictionary encoding of persistence diagrams. In both scenarios, we demonstrate the added robustness to outliers provided by our generalized framework. Our Python implementation is available at this address: https://github.com/Keanu-Sisouk/RobustBarycenter .

Robust Barycenters of Persistence Diagrams

TL;DR

This work addresses the sensitivity of persistence-diagram barycenters to outliers by generalizing Wasserstein barycenters to generic transport costs with . It adapts a fixed-point method to compute robust barycenters via a two-step optimization: assignment and a gradient-based ground-barycenter update using , ensuring nonincreasing Fréchet energy under suitable conditions. The framework is validated on clustering and Wasserstein dictionary encoding tasks, with empirical evidence that lower values of (e.g., in ) yield stronger robustness to outliers while maintaining representational quality. The authors provide a PyTorch-enabled implementation and demonstrate practical benefits for real-life ensembles of persistence diagrams, highlighting potential extensions to other topological descriptors.

Abstract

This short paper presents a general approach for computing robust Wasserstein barycenters of persistence diagrams. The classical method consists in computing assignment arithmetic means after finding the optimal transport plans between the barycenter and the persistence diagrams. However, this procedure only works for the transportation cost related to the -Wasserstein distance when . We adapt an alternative fixed-point method to compute a barycenter diagram for generic transportation costs (), in particular those robust to outliers, . We show the utility of our work in two applications: \emph{(i)} the clustering of persistence diagrams on their metric space and \emph{(ii)} the dictionary encoding of persistence diagrams. In both scenarios, we demonstrate the added robustness to outliers provided by our generalized framework. Our Python implementation is available at this address: https://github.com/Keanu-Sisouk/RobustBarycenter .

Paper Structure

This paper contains 14 sections, 5 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Example of persistence diagrams of a smooth (left) and noisy scalar field (right). The four main features are represented with long bars in the persistence diagrams. In the noisy diagram, the noise in the scalar field is encoded by small bars near the diagonal.
  • Figure 2: Comparison of barycenters computed with different values of $q$. On the left we have terrain views of four scalar fields colored in blue, gray, yellow and green, the latter being an outlier (featuring more peaks). The corresponding persistence diagrams are represented with matching colors and the barycenters are represented in cyan. The barycenter with $q = 2$ (center) is more sensitive to the presence of the green outlier, with two cyan bars of medium persistence, due to the outlier peaks in the green dataset. For $q=1.5$ (right), the persistence of these two bars is significantly reduced, and so will be their importance in distance computations.
  • Figure 3: Simple example where computing the arithmetic mean instead of optimizing $\mathfrak{b}_{{q}}$ increases the Fréchet energy (noted $E_F$) for $q = 1$. We have three simple persistence diagrams, in dark blue, gray and yellow, each having a single point. For this problem, the transport plans are fixed and the barycenter has only one point. On the left we initialized the barycenter as the diagram encoded in green. In the middle, we have the candidate of the barycenter encoded in cyan when computing an arithmetic mean after one iteration. We can see that the Fréchet energy (for $q = 1$) increased. On the right, we have a candidate for the barycenter encoded in purple when optimizing $\mathfrak{b}_{{q}}$ instead, this time displaying a decrease of the Fréchet energy at one iteration.
  • Figure 4: Comparison of clustering results on an ensemble of diagrams of Gaussian mixtures. On the left we have the 3 clusters: one cluster of 2 Gaussians (top), one cluster of 3 Gaussians (middle) and one cluster of 4 Gaussians (bottom). In the first and second clusters, we inserted an outlier (highlighted in green and cyan respectively) by setting one isolated pixel (highlighted in red) to an arbitrarily high value. Those pixels result in persistent pairs in the corresponding diagrams. On top we have the distance matrices of $W_{{q}}$ for $q \in \{2, 1.8, 1.6, 1.4, 1.2, 1\}$. In the distance matrices, the clustering results are shown with dashed squares (clusters are colored in dark purple, purple and pale purple) while the outlier diagrams are indicated with a plain square (green and cyan). In the three frames; we visualize the evolution of each cluster and their barycenters for each $q$. Each frame corresponds to a cluster (top: cluster 1, middle: cluster 2, bottom: cluster 3). The outlier diagrams are colored in green and cyan. The barycenters are shown in opaque while the diagrams of each cluster are shown in transparent. We observe that for $q \in \{2,1.8\}$ the green outlier is incorrectly assigned to the second cluster (as it exhibits the same number of persistence pairs, 3, as the entries of cluster 2), then, from $q = 1.6$, it is correctly assigned to the first cluster as shown by the green arrow (it shifts from the second frame to the first one). Similarly, given its number of persistence pairs, the cyan outlier is incorrectly assigned to the third cluster until $q=1.6$, then from $q = 1.4$ it shifts to the second cluster as indicated by the cyan arrow.
  • Figure 5: Visual comparison of distance matrices using $W_{{q}}$ for $q \in \{2, 1.8, 1.6, 1.4, 1.2, 1\}$ on the Volcanic Eruption ensemble and the clustering results. This ensemble of 12 persistence diagrams has a natural outlier highlighted in cyan on the distance matrices. On the top, we can see that for $q \in \{2, 1.8\}$, the clustering algorithm keeps the outlier alone, groups the 8 first diagrams together and groups the last three together. Then starting from 1.6 to 1.2, the correct clusters are returned. But for $q = 1$, we can see that the clusters are not discriminated enough. In the middle we have one representative scalar field for each cluster, and on the right the corresponding diagrams, the cyan scalar field and diagram being the outlier. On the bottom, we have a visual comparison of three barycenters (pink) of the last cluster in three cases: one computed with the outlier in cyan when $q = 2$, one without the outlier when $q = 2$ and one with the outlier when $q = 1.2$. We witness the influence of the outlier on the barycenter on the left, as there are two persistent pairs (in pink) higher than the ones in the other three diagrams of the cluster. Also, we notice the presence of an isolated pair above the diagonal that is generated by the isolated cyan pair. In the other cases, the barycenters (pink) are very similar, showing the robustness of this $1.2$-barycenter to the presence of this outlier.
  • ...and 1 more figures