Table of Contents
Fetching ...

Dimension Reduction with Locally Adjusted Graphs

Yingfan Wang, Yiyang Sun, Haiyang Huang, Cynthia Rudin

TL;DR

LocalMAP tackles the core weakness of graph-based DR: static high-dimensional graphs that misrepresent local structure due to unreliable distances. By locally adjusting NN edge weights and resampling FP edges during optimization, LocalMAP suppresses false positives and strengthens boundaries, enabling crisper, more accurate cluster separation. Across diverse datasets, LocalMAP achieves higher silhouette scores and robust cluster delineation, with reasonable runtime and stable performance under different initializations. This approach offers practical benefits for applications like single-cell transcriptomics and other large-scale clustering problems, and suggests avenues for integrating dynamic graph adjustments with parametric DR frameworks.

Abstract

Dimension reduction (DR) algorithms have proven to be extremely useful for gaining insight into large-scale high-dimensional datasets, particularly finding clusters in transcriptomic data. The initial phase of these DR methods often involves converting the original high-dimensional data into a graph. In this graph, each edge represents the similarity or dissimilarity between pairs of data points. However, this graph is frequently suboptimal due to unreliable high-dimensional distances and the limited information extracted from the high-dimensional data. This problem is exacerbated as the dataset size increases. If we reduce the size of the dataset by selecting points for a specific sections of the embeddings, the clusters observed through DR are more separable since the extracted subgraphs are more reliable. In this paper, we introduce LocalMAP, a new dimensionality reduction algorithm that dynamically and locally adjusts the graph to address this challenge. By dynamically extracting subgraphs and updating the graph on-the-fly, LocalMAP is capable of identifying and separating real clusters within the data that other DR methods may overlook or combine. We demonstrate the benefits of LocalMAP through a case study on biological datasets, highlighting its utility in helping users more accurately identify clusters for real-world problems.

Dimension Reduction with Locally Adjusted Graphs

TL;DR

LocalMAP tackles the core weakness of graph-based DR: static high-dimensional graphs that misrepresent local structure due to unreliable distances. By locally adjusting NN edge weights and resampling FP edges during optimization, LocalMAP suppresses false positives and strengthens boundaries, enabling crisper, more accurate cluster separation. Across diverse datasets, LocalMAP achieves higher silhouette scores and robust cluster delineation, with reasonable runtime and stable performance under different initializations. This approach offers practical benefits for applications like single-cell transcriptomics and other large-scale clustering problems, and suggests avenues for integrating dynamic graph adjustments with parametric DR frameworks.

Abstract

Dimension reduction (DR) algorithms have proven to be extremely useful for gaining insight into large-scale high-dimensional datasets, particularly finding clusters in transcriptomic data. The initial phase of these DR methods often involves converting the original high-dimensional data into a graph. In this graph, each edge represents the similarity or dissimilarity between pairs of data points. However, this graph is frequently suboptimal due to unreliable high-dimensional distances and the limited information extracted from the high-dimensional data. This problem is exacerbated as the dataset size increases. If we reduce the size of the dataset by selecting points for a specific sections of the embeddings, the clusters observed through DR are more separable since the extracted subgraphs are more reliable. In this paper, we introduce LocalMAP, a new dimensionality reduction algorithm that dynamically and locally adjusts the graph to address this challenge. By dynamically extracting subgraphs and updating the graph on-the-fly, LocalMAP is capable of identifying and separating real clusters within the data that other DR methods may overlook or combine. We demonstrate the benefits of LocalMAP through a case study on biological datasets, highlighting its utility in helping users more accurately identify clusters for real-world problems.

Paper Structure

This paper contains 35 sections, 2 theorems, 15 equations, 19 figures, 8 tables.

Key Result

Theorem 1

Assume all points between two clusters are approximately equidistant, so that the probability of constructing a positive pair between points from these clusters is constant. The ratio between the number of NN edges to the number of FP edges of PaCMAP between two clusters increases with the number of

Figures (19)

  • Figure 1: DR embeddings on MNIST dataset which contains $10$ digit classes. Our LocalMAP method is on the right. The colored embeddings with true labels are shown in Figure \ref{['fig:case_study']}.
  • Figure 2: Visualization of NN edge connections of PaCMAP embedding on MNIST dataset.
  • Figure 3: Left: PaCMAP embedding on the entire MNIST dataset where six clusters are groups into two large ones (each with three digit classes). Right: PaCMAP embeddings on each of the two groups of three digit classes. The right embeddings work when the left do not because the partial datasets are smaller and thus do not suffer from the problem identified in Insight 2.
  • Figure 4: Curve of $\textrm{Coefficient}_{\text{NN}}$.
  • Figure 5: Case study on MNIST lecun2010mnist, USPS USPS and Kang kang2018multiplexed. The Silhouette scores are shown in parentheses.
  • ...and 14 more figures

Theorems & Definitions (4)

  • Theorem 1
  • proof
  • Theorem 2
  • proof