Table of Contents
Fetching ...

StarMAP: Global Neighbor Embedding for Faithful Data Visualization

Koshi Watanabe, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

TL;DR

The paper tackles faithful visualization of high-dimensional data by addressing the global structure often neglected by conventional neighbor embeddings. It introduces StarMAP, which adds a star attraction that leverages PCA embedding through fixed anchor stars obtained via K-means and PCA, balancing global guidance with local neighbor relations through a tunable parameter. The approach is demonstrated across synthetic, real-world, and deep-representation datasets, showing improved preservation of both global and local structure and revealing semantic coherence in CLIP-based visualizations. The work offers a simple yet effective augmentation to neighbor embedding that enhances interpretability and scalability, with noted limitations around hyperparameter C and PCA overlap scenarios, suggesting directions for automatic C selection and robustness improvements.

Abstract

Neighbor embedding is widely employed to visualize high-dimensional data; however, it frequently overlooks the global structure, e.g., intercluster similarities, thereby impeding accurate visualization. To address this problem, this paper presents Star-attracted Manifold Approximation and Projection (StarMAP), which incorporates the advantage of principal component analysis (PCA) in neighbor embedding. Inspired by the property of PCA embedding, which can be viewed as the largest shadow of the data, StarMAP introduces the concept of \textit{star attraction} by leveraging the PCA embedding. This approach yields faithful global structure preservation while maintaining the interpretability and computational efficiency of neighbor embedding. StarMAP was compared with existing methods in the visualization tasks of toy datasets, single-cell RNA sequencing data, and deep representation. The experimental results show that StarMAP is simple but effective in realizing faithful visualizations.

StarMAP: Global Neighbor Embedding for Faithful Data Visualization

TL;DR

The paper tackles faithful visualization of high-dimensional data by addressing the global structure often neglected by conventional neighbor embeddings. It introduces StarMAP, which adds a star attraction that leverages PCA embedding through fixed anchor stars obtained via K-means and PCA, balancing global guidance with local neighbor relations through a tunable parameter. The approach is demonstrated across synthetic, real-world, and deep-representation datasets, showing improved preservation of both global and local structure and revealing semantic coherence in CLIP-based visualizations. The work offers a simple yet effective augmentation to neighbor embedding that enhances interpretability and scalability, with noted limitations around hyperparameter C and PCA overlap scenarios, suggesting directions for automatic C selection and robustness improvements.

Abstract

Neighbor embedding is widely employed to visualize high-dimensional data; however, it frequently overlooks the global structure, e.g., intercluster similarities, thereby impeding accurate visualization. To address this problem, this paper presents Star-attracted Manifold Approximation and Projection (StarMAP), which incorporates the advantage of principal component analysis (PCA) in neighbor embedding. Inspired by the property of PCA embedding, which can be viewed as the largest shadow of the data, StarMAP introduces the concept of \textit{star attraction} by leveraging the PCA embedding. This approach yields faithful global structure preservation while maintaining the interpretability and computational efficiency of neighbor embedding. StarMAP was compared with existing methods in the visualization tasks of toy datasets, single-cell RNA sequencing data, and deep representation. The experimental results show that StarMAP is simple but effective in realizing faithful visualizations.

Paper Structure

This paper contains 24 sections, 11 equations, 18 figures, 1 table.

Figures (18)

  • Figure 1: Visualization results on the Mammoth and MNIST datasets with PCA, UMAP, PaCMAP, and StarMAP (ours).
  • Figure 2: Illustration of the UMAP (black) and the proposed StarMAP (blue) algorithms.
  • Figure 3: Comparison of update procedure between UMAP and StarMAP.
  • Figure 4: Visualization result obtained by UMAP and StarMAP on synthetic hierarchical cluster dataset.
  • Figure 5: Visualization results on six real-world datasets. The color code of the Mammoth dataset reflects different body parts. For the MNIST, Fashion MNIST, and scRNA-seq datasets, the codes identify the digit, fashion item, and cel clusters, respectively. In addition, the code of the Cortex dataset exhibits the hierarchical structure within the inhibitory (hot), excitatory (cool), and non-neuron (gray) clusters, and that of the Planaria dataset contains the lineage from the neoblast (gray) to the resulting populations.
  • ...and 13 more figures