HUMAP: Hierarchical Uniform Manifold Approximation and Projection
Wilson E. Marcílio-Jr, Danilo M. Eler, Fernando V. Paulovich, Rafael M. Martins
TL;DR
HUMAP tackles the need for scalable, multi-scale visualization of high-dimensional data by introducing a hierarchical dimensionality reduction framework that preserves both local and global structure while maintaining a stable mental map across levels. It builds a bottom-up hierarchy using a kernel-based similarity $p_{i|j}=e^{-(d(x_i,x_j)-\rho_i)/\sigma_i}$ with a fixed $k$ and random walks to identify landmarks, then generates level-specific embeddings with a UMAP-like objective guided by higher-level coordinates; a controlled movement parameter $\theta$ preserves layout continuity during drill-down. The method supports subset projection through a principled association of non-landmark points to landmarks and propagates representations across levels, enabling progressive analysis as demonstrated in a COVID-19 tweet case study. Empirical results on MNIST, FMNIST, Mammals, and Embryoid Body show HUMAP offers competitive runtimes and superior hierarchical structure preservation, with reproducibility and scalability considerations discussed for practical deployment.
Abstract
Dimensionality reduction (DR) techniques help analysts to understand patterns in high-dimensional spaces. These techniques, often represented by scatter plots, are employed in diverse science domains and facilitate similarity analysis among clusters and data samples. For datasets containing many granularities or when analysis follows the information visualization mantra, hierarchical DR techniques are the most suitable approach since they present major structures beforehand and details on demand. This work presents HUMAP, a novel hierarchical dimensionality reduction technique designed to be flexible on preserving local and global structures and preserve the mental map throughout hierarchical exploration. We provide empirical evidence of our technique's superiority compared with current hierarchical approaches and show a case study applying HUMAP for dataset labelling.
