StarMAP: Global Neighbor Embedding for Faithful Data Visualization
Koshi Watanabe, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama
TL;DR
The paper tackles faithful visualization of high-dimensional data by addressing the global structure often neglected by conventional neighbor embeddings. It introduces StarMAP, which adds a star attraction that leverages PCA embedding through fixed anchor stars obtained via K-means and PCA, balancing global guidance with local neighbor relations through a tunable parameter. The approach is demonstrated across synthetic, real-world, and deep-representation datasets, showing improved preservation of both global and local structure and revealing semantic coherence in CLIP-based visualizations. The work offers a simple yet effective augmentation to neighbor embedding that enhances interpretability and scalability, with noted limitations around hyperparameter C and PCA overlap scenarios, suggesting directions for automatic C selection and robustness improvements.
Abstract
Neighbor embedding is widely employed to visualize high-dimensional data; however, it frequently overlooks the global structure, e.g., intercluster similarities, thereby impeding accurate visualization. To address this problem, this paper presents Star-attracted Manifold Approximation and Projection (StarMAP), which incorporates the advantage of principal component analysis (PCA) in neighbor embedding. Inspired by the property of PCA embedding, which can be viewed as the largest shadow of the data, StarMAP introduces the concept of \textit{star attraction} by leveraging the PCA embedding. This approach yields faithful global structure preservation while maintaining the interpretability and computational efficiency of neighbor embedding. StarMAP was compared with existing methods in the visualization tasks of toy datasets, single-cell RNA sequencing data, and deep representation. The experimental results show that StarMAP is simple but effective in realizing faithful visualizations.
