Cluster-based multidimensional scaling embedding tool for data visualization
Patricia Hernández-León, Miguel A. Caro
TL;DR
The paper addresses the challenge of visualizing high-dimensional data by preserving local and global structures in a single $2$-D embedding. It introduces cluster MDS (cl-MDS), which first computes $N_\text{cl}$ local MDS embeddings on $k$-medoids clusters, then selects up to four anchor points per cluster to define a global anchor map via MDS, and finally merges the two via per-cluster affine or projective transformations. The approach is enhanced with a hierarchical embedding option and sparsification to scale to very large datasets, including atomic-structure datasets using SOAP descriptors. Demonstrations on CHO, QM9, and PtAu nanoparticle data show improved visualization of multi-scale locality and meaningful medoid-based interpretation compared to standard methods such as PCA, Isomap, t-SNE, and UMAP, with practical benefits for materials science and chemistry workloads.
Abstract
We present a new technique for visualizing high-dimensional data called cluster MDS (cl-MDS), which addresses a common difficulty of dimensionality reduction methods: preserving both local and global structures of the original sample in a single 2-dimensional visualization. Its algorithm combines the well-known multidimensional scaling (MDS) tool with the $k$-medoids data clustering technique, and enables hierarchical embedding, sparsification and estimation of 2-dimensional coordinates for additional points. While cl-MDS is a generally applicable tool, we also include specific recipes for atomic structure applications. We apply this method to non-linear data of increasing complexity where different layers of locality are relevant, showing a clear improvement in their retrieval and visualization quality.
