Dimension Reduction with Locally Adjusted Graphs
Yingfan Wang, Yiyang Sun, Haiyang Huang, Cynthia Rudin
TL;DR
LocalMAP tackles the core weakness of graph-based DR: static high-dimensional graphs that misrepresent local structure due to unreliable distances. By locally adjusting NN edge weights and resampling FP edges during optimization, LocalMAP suppresses false positives and strengthens boundaries, enabling crisper, more accurate cluster separation. Across diverse datasets, LocalMAP achieves higher silhouette scores and robust cluster delineation, with reasonable runtime and stable performance under different initializations. This approach offers practical benefits for applications like single-cell transcriptomics and other large-scale clustering problems, and suggests avenues for integrating dynamic graph adjustments with parametric DR frameworks.
Abstract
Dimension reduction (DR) algorithms have proven to be extremely useful for gaining insight into large-scale high-dimensional datasets, particularly finding clusters in transcriptomic data. The initial phase of these DR methods often involves converting the original high-dimensional data into a graph. In this graph, each edge represents the similarity or dissimilarity between pairs of data points. However, this graph is frequently suboptimal due to unreliable high-dimensional distances and the limited information extracted from the high-dimensional data. This problem is exacerbated as the dataset size increases. If we reduce the size of the dataset by selecting points for a specific sections of the embeddings, the clusters observed through DR are more separable since the extracted subgraphs are more reliable. In this paper, we introduce LocalMAP, a new dimensionality reduction algorithm that dynamically and locally adjusts the graph to address this challenge. By dynamically extracting subgraphs and updating the graph on-the-fly, LocalMAP is capable of identifying and separating real clusters within the data that other DR methods may overlook or combine. We demonstrate the benefits of LocalMAP through a case study on biological datasets, highlighting its utility in helping users more accurately identify clusters for real-world problems.
