Table of Contents
Fetching ...

TopoMap++: A faster and more space efficient technique to compute projections with topological guarantees

Vitoria Guardieiro, Felipe Inagaki de Oliveira, Harish Doraiswamy, Luis Gustavo Nonato, Claudio Silva

TL;DR

The paper tackles the challenge of visualizing high-dimensional data with topology-preserving guarantees while avoiding sparsity and high computation costs. It introduces TopoMap++, combining a space-efficient layout, topological simplification-driven component highlighting, a TreeMap-based exploration interface, and a fast approximate MST via the Vamana graph to scale projections. Key contributions include a) identifying and emphasizing dense topological components, b) a nested TreeMap visualization for interactive exploration, and c) an efficient AMST pipeline that preserves essential 0-dimensional topology with large speedups. The results across diverse datasets demonstrate substantially improved visual space usage and practical scalability for analyzing complex high-dimensional embeddings.

Abstract

High-dimensional data, characterized by many features, can be difficult to visualize effectively. Dimensionality reduction techniques, such as PCA, UMAP, and t-SNE, address this challenge by projecting the data into a lower-dimensional space while preserving important relationships. TopoMap is another technique that excels at preserving the underlying structure of the data, leading to interpretable visualizations. In particular, TopoMap maps the high-dimensional data into a visual space, guaranteeing that the 0-dimensional persistence diagram of the Rips filtration of the visual space matches the one from the high-dimensional data. However, the original TopoMap algorithm can be slow and its layout can be too sparse for large and complex datasets. In this paper, we propose three improvements to TopoMap: 1) a more space-efficient layout, 2) a significantly faster implementation, and 3) a novel TreeMap-based representation that makes use of the topological hierarchy to aid the exploration of the projections. These advancements make TopoMap, now referred to as TopoMap++, a more powerful tool for visualizing high-dimensional data which we demonstrate through different use case scenarios.

TopoMap++: A faster and more space efficient technique to compute projections with topological guarantees

TL;DR

The paper tackles the challenge of visualizing high-dimensional data with topology-preserving guarantees while avoiding sparsity and high computation costs. It introduces TopoMap++, combining a space-efficient layout, topological simplification-driven component highlighting, a TreeMap-based exploration interface, and a fast approximate MST via the Vamana graph to scale projections. Key contributions include a) identifying and emphasizing dense topological components, b) a nested TreeMap visualization for interactive exploration, and c) an efficient AMST pipeline that preserves essential 0-dimensional topology with large speedups. The results across diverse datasets demonstrate substantially improved visual space usage and practical scalability for analyzing complex high-dimensional embeddings.

Abstract

High-dimensional data, characterized by many features, can be difficult to visualize effectively. Dimensionality reduction techniques, such as PCA, UMAP, and t-SNE, address this challenge by projecting the data into a lower-dimensional space while preserving important relationships. TopoMap is another technique that excels at preserving the underlying structure of the data, leading to interpretable visualizations. In particular, TopoMap maps the high-dimensional data into a visual space, guaranteeing that the 0-dimensional persistence diagram of the Rips filtration of the visual space matches the one from the high-dimensional data. However, the original TopoMap algorithm can be slow and its layout can be too sparse for large and complex datasets. In this paper, we propose three improvements to TopoMap: 1) a more space-efficient layout, 2) a significantly faster implementation, and 3) a novel TreeMap-based representation that makes use of the topological hierarchy to aid the exploration of the projections. These advancements make TopoMap, now referred to as TopoMap++, a more powerful tool for visualizing high-dimensional data which we demonstrate through different use case scenarios.
Paper Structure (19 sections, 1 theorem, 7 figures, 4 tables, 2 algorithms)

This paper contains 19 sections, 1 theorem, 7 figures, 4 tables, 2 algorithms.

Key Result

Lemma 1

Let $\mathbb{K}_0 = \{e_1,e_2,\\\dots,e_{n-1}\}$ be the ordered set of topology changing edges of $P$. Then, $\mathbb{K}_0$ is exactly the set of edges of the Euclidean distance minimum spanning tree ($E_{mst}$) of the points $P$ in increasing order of length.

Figures (7)

  • Figure 1: Computing the hierarchical tree based on the Rips filtration defined by the minimum spanning tree. (a) MST over an input with ten points (labeled from A to J). The edges weights (length) are specified for each edge. (b) The set of components after the filtration has processed 3 edges of the MST. The merged components now become a single node labeled using "[]". (c) The set of components after the filtration has processed 5 edges of the MST..
  • Figure 2: (a) The hierarchy of the components formed during the filtration in Figure \ref{['fig:tree-construction']} is represented as a hierarchical binary tree. (b) Simplified tree when $\eta=2$ (as well as when $\eta=3$. The components chosen when $\eta = 2$ are shown by a blue border, while those chosen when $\eta=3$ are shown by a red border.
  • Figure 3: Illustration of how the edge lengths impact the point density. The black points form a component and the segments correspond to the edges processed to form said component. Point A has two edges in the MST -- one connecting it to the black component (with distance $l_1$) and the other connecting it with point B ($l_2$). Since $l_1 < l_2$, A is the only point inside the ball centered at it with radius $l_1$. Similarly, B is the only point inside the ball around it with radius $l_2$. Since all edges between the points in the component are way smaller than $l_2$, their density in the visual space is higher than the density around point B.
  • Figure 4: TreeMaps corresponding to (a) the unsimplified hierarchical tree; (b) simplified tree with $\eta=2$; and (c) simplified tree with $\eta=3$. The gray color represents the component with infinite persistence.
  • Figure 5: (a) TopoMap projection, (b) Hierarchical TreeMap, and (c) TopoMap++ projection generated using the urban data set from Case Study 1. The selected components in (b) are used to emphasize the clusters in the TopoMap++ projection (c). The same points are also colored with the corresponding colors in the original TopoMap projection (a). Note that these components end up being small due to the inefficient use of the visual space by the original algorithm. Using our proposed approach, it becomes easy to identify and analyze such features in the data.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Lemma 1: Doraiswamy et al. doraiswamy2020topomap, Lemma 2
  • Definition 1