Enhanced High-Dimensional Data Visualization through Adaptive Multi-Scale Manifold Embedding
Tianhao Ni, Bingjie Li, Zhigang Yao
TL;DR
AMSME tackles high-dimensional data visualization by replacing absolute distances with ordinal rankings $o(x_i; x_j)$ and employing an adaptive multi-scale neighborhood to build a similarity graph. The method performs a two-stage nonlinear embedding, producing an initial layout $Y_1$ with pseudo-labels and a final layout $Y_2$ with enhanced inter-cluster separation via a label-driven reweighting of the distance matrix. Theoretical results show ordinal distances remain discriminative in high dimensions, and experiments on image and text datasets show consistent improvements over $t$-SNE, UMAP, and PaCMAP, with substantial gains in clustering accuracy and topology preservation. The approach also demonstrates multi-resolution analysis in scRNA-seq data, uncovering novel neuronal subtypes and associated marker genes, highlighting AMSME's practical impact for biology and beyond.
Abstract
To address the dual challenges of the curse of dimensionality and the difficulty in separating intra-cluster and inter-cluster structures in high-dimensional manifold embedding, we proposes an Adaptive Multi-Scale Manifold Embedding (AMSME) algorithm. By introducing ordinal distance to replace traditional Euclidean distances, we theoretically demonstrate that ordinal distance overcomes the constraints of the curse of dimensionality in high-dimensional spaces, effectively distinguishing heterogeneous samples. We design an adaptive neighborhood adjustment method to construct similarity graphs that simultaneously balance intra-cluster compactness and inter-cluster separability. Furthermore, we develop a two-stage embedding framework: the first stage achieves preliminary cluster separation while preserving connectivity between structurally similar clusters via the similarity graph, and the second stage enhances inter-cluster separation through a label-driven distance reweighting. Experimental results demonstrate that AMSME significantly preserves intra-cluster topological structures and improves inter-cluster separation on real-world datasets. Additionally, leveraging its multi-resolution analysis capability, AMSME discovers novel neuronal subtypes in the mouse lumbar dorsal root ganglion scRNA-seq dataset, with marker gene analysis revealing their distinct biological roles.
