Table of Contents
Fetching ...

HUMAP: Hierarchical Uniform Manifold Approximation and Projection

Wilson E. Marcílio-Jr, Danilo M. Eler, Fernando V. Paulovich, Rafael M. Martins

TL;DR

HUMAP tackles the need for scalable, multi-scale visualization of high-dimensional data by introducing a hierarchical dimensionality reduction framework that preserves both local and global structure while maintaining a stable mental map across levels. It builds a bottom-up hierarchy using a kernel-based similarity $p_{i|j}=e^{-(d(x_i,x_j)-\rho_i)/\sigma_i}$ with a fixed $k$ and random walks to identify landmarks, then generates level-specific embeddings with a UMAP-like objective guided by higher-level coordinates; a controlled movement parameter $\theta$ preserves layout continuity during drill-down. The method supports subset projection through a principled association of non-landmark points to landmarks and propagates representations across levels, enabling progressive analysis as demonstrated in a COVID-19 tweet case study. Empirical results on MNIST, FMNIST, Mammals, and Embryoid Body show HUMAP offers competitive runtimes and superior hierarchical structure preservation, with reproducibility and scalability considerations discussed for practical deployment.

Abstract

Dimensionality reduction (DR) techniques help analysts to understand patterns in high-dimensional spaces. These techniques, often represented by scatter plots, are employed in diverse science domains and facilitate similarity analysis among clusters and data samples. For datasets containing many granularities or when analysis follows the information visualization mantra, hierarchical DR techniques are the most suitable approach since they present major structures beforehand and details on demand. This work presents HUMAP, a novel hierarchical dimensionality reduction technique designed to be flexible on preserving local and global structures and preserve the mental map throughout hierarchical exploration. We provide empirical evidence of our technique's superiority compared with current hierarchical approaches and show a case study applying HUMAP for dataset labelling.

HUMAP: Hierarchical Uniform Manifold Approximation and Projection

TL;DR

HUMAP tackles the need for scalable, multi-scale visualization of high-dimensional data by introducing a hierarchical dimensionality reduction framework that preserves both local and global structure while maintaining a stable mental map across levels. It builds a bottom-up hierarchy using a kernel-based similarity with a fixed and random walks to identify landmarks, then generates level-specific embeddings with a UMAP-like objective guided by higher-level coordinates; a controlled movement parameter preserves layout continuity during drill-down. The method supports subset projection through a principled association of non-landmark points to landmarks and propagates representations across levels, enabling progressive analysis as demonstrated in a COVID-19 tweet case study. Empirical results on MNIST, FMNIST, Mammals, and Embryoid Body show HUMAP offers competitive runtimes and superior hierarchical structure preservation, with reproducibility and scalability considerations discussed for practical deployment.

Abstract

Dimensionality reduction (DR) techniques help analysts to understand patterns in high-dimensional spaces. These techniques, often represented by scatter plots, are employed in diverse science domains and facilitate similarity analysis among clusters and data samples. For datasets containing many granularities or when analysis follows the information visualization mantra, hierarchical DR techniques are the most suitable approach since they present major structures beforehand and details on demand. This work presents HUMAP, a novel hierarchical dimensionality reduction technique designed to be flexible on preserving local and global structures and preserve the mental map throughout hierarchical exploration. We provide empirical evidence of our technique's superiority compared with current hierarchical approaches and show a case study applying HUMAP for dataset labelling.

Paper Structure

This paper contains 23 sections, 5 equations, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: The hierarchy is built from the bottom up. First, the connection strength between data points in the high-dimensional space is determined using a k-nearest neighbor graph and a kernel function (A). After several random walk steps, the graph's structure and the strength of the connections are used to determine which nodes have been visited the most (B). The landmarks correspond to the points in the higher hierarchy level (C). To repeat the same procedure for high hierarchy levels, we calculate the intersection of representation neighborhoods (E), which were formed by joining local and global neighborhoods (D). With the exception of the first hierarchy level (whole dataset), we compute the k-nearest neighbors using a sorting algorithm (F). We employ a modified UMAP McInnes2018 optimization for projecting hierarchy levels (or subsets of them). Finally, the graph is symmetrized (H) and coordinates of projected points influence the positioning subsequent levels (I).
  • Figure 2: Hierarchical exploration with HUMAP using mental map.
  • Figure 3: HUMAP exploration and annotation of a document collection of COVID-19 tweets. The top-level hierarchy level shows unlabeled data points and three major structures (A). We annotate these three clusters (B) and compute their topics computed (F). For each cluster in (B), we also project their corresponding level (and final) hierarchy to look for other patterns, annotating the dataset (C, D, E) and computing their topics.
  • Figure 4: Manually annotated UMAP projection with cluster topics.
  • Figure 5: Visual analysis of the embeddings generated for top and lowest hierarchical levels using a three-level hierarchy. For each dataset, top-level embedding appears on the left, and the lowest level (whole dataset) appears on the right.
  • ...and 4 more figures