Table of Contents
Fetching ...

Topological Autoencoders++: Fast and Accurate Cycle-Aware Dimensionality Reduction

Mattéo Clémot, Julie Digne, Julien Tierny

TL;DR

TopoAE++ advances topology-aware dimensionality reduction by extending TopoAE to preserve $PH^{1}$ cycles through a cascade distortion loss that enforces isometric filling of 1-cycles. It introduces a fast planar $PH$ computation algorithm tailored to 2D embeddings, enabling practical optimization, and demonstrates a strong balance between topological accuracy and visual fidelity on synthetic and real datasets. Theoretical results clarify when $PH^{0}$ preservation guarantees hold and motivate the new $PH^{1}$-aware formulation, while empirical results show competitive $PD^{1}$ distances and improved cycle visualization compared to baselines. The work provides an open-source C++ implementation and outlines directions toward higher-dimensional latent spaces and homology levels, broadening the applicability of cycle-aware DR in complex data landscapes.

Abstract

This paper presents a novel topology-aware dimensionality reduction approach aiming at accurately visualizing the cyclic patterns present in high dimensional data. To that end, we build on the Topological Autoencoders (TopoAE) formulation. First, we provide a novel theoretical analysis of its associated loss and show that a zero loss indeed induces identical persistence pairs (in high and low dimensions) for the $0$-dimensional persistent homology (PH$^0$) of the Rips filtration. We also provide a counter example showing that this property no longer holds for a naive extension of TopoAE to PH$^d$ for $d\ge 1$. Based on this observation, we introduce a novel generalization of TopoAE to $1$-dimensional persistent homology (PH$^1$), called TopoAE++, for the accurate generation of cycle-aware planar embeddings, addressing the above failure case. This generalization is based on the notion of cascade distortion, a new penalty term favoring an isometric embedding of the $2$-chains filling persistent $1$-cycles, hence resulting in more faithful geometrical reconstructions of the $1$-cycles in the plane. We further introduce a novel, fast algorithm for the exact computation of PH for Rips filtrations in the plane, yielding improved runtimes over previously documented topology-aware methods. Our method also achieves a better balance between the topological accuracy, as measured by the Wasserstein distance, and the visual preservation of the cycles in low dimensions. Our C++ implementation is available at https://github.com/MClemot/TopologicalAutoencodersPlusPlus.

Topological Autoencoders++: Fast and Accurate Cycle-Aware Dimensionality Reduction

TL;DR

TopoAE++ advances topology-aware dimensionality reduction by extending TopoAE to preserve cycles through a cascade distortion loss that enforces isometric filling of 1-cycles. It introduces a fast planar computation algorithm tailored to 2D embeddings, enabling practical optimization, and demonstrates a strong balance between topological accuracy and visual fidelity on synthetic and real datasets. Theoretical results clarify when preservation guarantees hold and motivate the new -aware formulation, while empirical results show competitive distances and improved cycle visualization compared to baselines. The work provides an open-source C++ implementation and outlines directions toward higher-dimensional latent spaces and homology levels, broadening the applicability of cycle-aware DR in complex data landscapes.

Abstract

This paper presents a novel topology-aware dimensionality reduction approach aiming at accurately visualizing the cyclic patterns present in high dimensional data. To that end, we build on the Topological Autoencoders (TopoAE) formulation. First, we provide a novel theoretical analysis of its associated loss and show that a zero loss indeed induces identical persistence pairs (in high and low dimensions) for the -dimensional persistent homology (PH) of the Rips filtration. We also provide a counter example showing that this property no longer holds for a naive extension of TopoAE to PH for . Based on this observation, we introduce a novel generalization of TopoAE to -dimensional persistent homology (PH), called TopoAE++, for the accurate generation of cycle-aware planar embeddings, addressing the above failure case. This generalization is based on the notion of cascade distortion, a new penalty term favoring an isometric embedding of the -chains filling persistent -cycles, hence resulting in more faithful geometrical reconstructions of the -cycles in the plane. We further introduce a novel, fast algorithm for the exact computation of PH for Rips filtrations in the plane, yielding improved runtimes over previously documented topology-aware methods. Our method also achieves a better balance between the topological accuracy, as measured by the Wasserstein distance, and the visual preservation of the cycles in low dimensions. Our C++ implementation is available at https://github.com/MClemot/TopologicalAutoencodersPlusPlus.

Paper Structure

This paper contains 51 sections, 6 theorems, 17 equations, 23 figures, 1 table, 4 algorithms.

Key Result

Lemma 1

For any point clouds $X$ and $Z$ of equal size, we have the following inequality: Besides, under general position hypothesis (unique pairwise distances), if $\mathcal{L}_{\mathrm{TAE}}^{0}(X,Z)=0$, then $\mathcal{D}_{\mathop{\mathrm{Rips}}\nolimits}^{0}(X)=\mathcal{D}_{\mathop{\mathrm{Rips}}\nolimits}^{0}(Z)$ and the $\text{PH}^{0}$ pairs are the same, i.e. $\mathsf{MST}(X)=\math

Figures (23)

  • Figure 1: Comparison of DR methods on three synthetic 3D point clouds (see \ref{['sec:test_data']} for a description), along with the metric distortion $\mathop{\mathrm{\mathfrak{D}}}\nolimits(X,Z)$ between the input $X$ and its planar embedding $Z$, the Wasserstein distance $\mathop{\mathrm{PD\mathcal{W}^1}}\nolimits(X,Z)= \mathcal{W}_2\bigl(\mathcal{D}_{\mathop{\mathrm{Rips}}\nolimits}^{1}(X),\mathcal{D}_{\mathop{\mathrm{Rips}}\nolimits}^{1}(Z)\bigr)$ between their respective 1-dimensional persistence diagrams, and the running time $t$ in seconds. The best value for an indicator is written in bold, the second best is underlined. In the second line, a generator of the most persistent $\text{PH}^{1}$ pair in the high-dimensional input is projected in the 2D embedding in transparency and slightly smoothed for visualization purposes (to distinguish it from the point cloud). The generator color map depicts its arc-length parameterization. In the third line, the input -- that is sampled around the edges of a tetrahedron in 3D -- features 3 significantly persistent $\text{PH}^{1}$ pairs, and a generator for each is represented both in the input and in the 2D embeddings with a specific color. Quantitatively, our approach (TopoAE++) generates planar embeddings with competitive Wasserstein distances. Qualitatively, it produces planar embeddings yielding less crossings of the projected high-dimensional persistent generators (colored curves), thereby producing visualizations that depict more faithfully the topological handles present in high dimensions.
  • Figure 2: Left: geometric graphs for a point cloud $X$ in the plane. Its $\mathsf{MST}$ is depicted in red. The edges from the $\mathsf{RNG}$ which are not in the $\mathsf{MST}$ are shown in continuous yellow, while those which are in the $\mathsf{UG}$ but not in the $\mathsf{RNG}$ are shown in dashed yellow. The remaining Delaunay edges are shown in gray. Three lenses are shown: the purple one is devoid of points of $X$, therefore the associated yellow edge belongs to the $\mathsf{RNG}$; the blue one -- associated with the dashed yellow edge -- is devoid of points of the link of that edge but contains another point of $X$ (the blue one), therefore this edge is in the $\mathsf{UG}$ but not in the $\mathsf{RNG}$; the green one contains the green point and the associated gray edge is not in $\mathsf{RNG}$ nor in $\mathsf{UG}$. Right: within the highlighted bottom $\mathsf{RNG}$-polygon, a $\mathsf{MML}$ triangulation has replaced the Delaunay triangulation, with its longest edge highlighted in black. Note that for the two other polygons, the Delaunay triangulation is already a $\mathsf{MML}$ triangulation within these polygons.
  • Figure 3: Rips complexes of the same planar point cloud $X$, for increasing diameter thresholds $r$ (only the 2-skeletons, i.e., the vertices, edges and triangles, of the simplicial complexes are shown). This threshold increase induces a sequence of nested simplicial complexes, whose topology varies along the process. From left to right, the number of connected components is successively $n$, then $2$, then $1$, while the number of handles is successively $0$, $2$, $1$, $2$, $1$ and $0$.
  • Figure 4: Two point clouds (red and blue) in the plane represented with their $\mathsf{RNG}$, along with their respective 1-dimensional persistence diagrams (right). The edges in $\mathsf{RNG}\setminus\mathsf{MST}$ are highlighted in bold and yellow. The number of non-diagonal points in each diagram is exactly the number of $\mathsf{RNG}$-polygons in the associated point cloud (3 for the red one, 2 for the blue one), and also the number of $\mathsf{RNG}\setminus\mathsf{MST}$ edges. The optimal assignment inducing the Wasserstein distance $\mathcal{W}_2$ between them is shown in black.
  • Figure 5: Some steps of two executions of the EliminateBoundaries\ref{['algo:eliminateBoundaries']} procedure run on the 2-simplices $\sigma_0$ (top line) and $\sigma_1$ (bottom line), corresponding to the creation of two $\text{PH}^{1}$ pairs in the previous planar point cloud (with its smallest cycle removed for clarity). Unpaired edges -- which are in $\mathsf{RNG}(X)\setminus\mathsf{MST}(X)$ -- are depicted in yellow. At each step, $\mathop{\mathrm{\mathrm{cascade}}}\nolimits[\sigma]$ is shown in red, and its boundary $\partial(\mathop{\mathrm{\mathrm{cascade}}}\nolimits[\sigma])$ is highlighted in bold, with its longest edge, i.e., $\tau=\text{Youngest}\bigl(\partial(\mathop{\mathrm{\mathrm{cascade}}}\nolimits[\sigma])\bigr)$, dotted. When it exists, $\mathop{\mathrm{\mathrm{cascade}}}\nolimits\bigl[\mathop{\mathrm{\mathrm{partner}}}\nolimits[\tau]\bigr]$ is shown is blue. A persistent pair $(\tau, \sigma)$ is created when $\mathop{\mathrm{\mathrm{partner}}}\nolimits[\tau]=\varnothing$, i.e., when $\tau$ coincides with a yellow unpaired edge (rightmost images). In both executions, $\mathop{\mathrm{Rips}}\nolimits_{\delta(\sigma)}^1$ is shown in light gray.
  • ...and 18 more figures

Theorems & Definitions (12)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof : Proof outline
  • Lemma 4
  • proof
  • Lemma 5
  • proof
  • ...and 2 more