Table of Contents
Fetching ...

DREAMS: Preserving both Local and Global Structure in Dimensionality Reduction

Noël Kury, Dmitry Kobak, Sebastian Damrich

TL;DR

DREAMS addresses the dichotomy that standard dimensionality reduction methods treat local and global data structure separately by introducing a PCA/MDS-based regularization term into the $t$-SNE objective, yielding a controllable spectrum of embeddings between highly local and globally coherent layouts. The method augments the objective with $\igl(1-\lambda\) \mathcal{L}_{t\text{-SNE}}(Y) + (\lambda/n) \|Y - \alpha \tilde{Y}\|_F^2$, enabling balanced preservation of structure across scales; $\lambda$ tunes the local-global trade-off and $\alpha$ scales the global embedding to the current embedding's magnitude. The authors benchmark DREAMS against a wide range of baselines on eleven real-world datasets, showing that DREAMS often achieves the best combined local-global preservation, with DREAMS-MDS offering alternative global references. A key finding is that a modest regularization strength ($\lambda \approx 0.15$) yields embeddings that retain both fine-grained clusters and broad groupings, improving interpretability for hierarchical data such as single-cell transcriptomics. The work provides open-source implementations and demonstrates the method's robustness and adaptability, though it notes trade-offs in runtime and the absence of formal guarantees for balance across all datasets.

Abstract

Dimensionality reduction techniques are widely used for visualizing high-dimensional data in two dimensions. Existing methods are typically designed to preserve either local (e.g., $t$-SNE, UMAP) or global (e.g., MDS, PCA) structure of the data, but none of the established methods can represent both aspects well. In this paper, we present DREAMS (Dimensionality Reduction Enhanced Across Multiple Scales), a method that combines the local structure preservation of $t$-SNE with the global structure preservation of PCA via a simple regularization term. Our approach generates a spectrum of embeddings between the locally well-structured $t$-SNE embedding and the globally well-structured PCA embedding, efficiently balancing both local and global structure preservation. We benchmark DREAMS across eleven real-world datasets, showcasing qualitatively and quantitatively its superior ability to preserve structure across multiple scales compared to previous approaches.

DREAMS: Preserving both Local and Global Structure in Dimensionality Reduction

TL;DR

DREAMS addresses the dichotomy that standard dimensionality reduction methods treat local and global data structure separately by introducing a PCA/MDS-based regularization term into the -SNE objective, yielding a controllable spectrum of embeddings between highly local and globally coherent layouts. The method augments the objective with , enabling balanced preservation of structure across scales; tunes the local-global trade-off and scales the global embedding to the current embedding's magnitude. The authors benchmark DREAMS against a wide range of baselines on eleven real-world datasets, showing that DREAMS often achieves the best combined local-global preservation, with DREAMS-MDS offering alternative global references. A key finding is that a modest regularization strength () yields embeddings that retain both fine-grained clusters and broad groupings, improving interpretability for hierarchical data such as single-cell transcriptomics. The work provides open-source implementations and demonstrates the method's robustness and adaptability, though it notes trade-offs in runtime and the absence of formal guarantees for balance across all datasets.

Abstract

Dimensionality reduction techniques are widely used for visualizing high-dimensional data in two dimensions. Existing methods are typically designed to preserve either local (e.g., -SNE, UMAP) or global (e.g., MDS, PCA) structure of the data, but none of the established methods can represent both aspects well. In this paper, we present DREAMS (Dimensionality Reduction Enhanced Across Multiple Scales), a method that combines the local structure preservation of -SNE with the global structure preservation of PCA via a simple regularization term. Our approach generates a spectrum of embeddings between the locally well-structured -SNE embedding and the globally well-structured PCA embedding, efficiently balancing both local and global structure preservation. We benchmark DREAMS across eleven real-world datasets, showcasing qualitatively and quantitatively its superior ability to preserve structure across multiple scales compared to previous approaches.

Paper Structure

This paper contains 22 sections, 12 equations, 20 figures, 3 tables.

Figures (20)

  • Figure 1: PCA, DREAMS, and $t$-SNE embeddings of the tasic2018shared dataset illustrate how DREAMS preserves the global organization seen in the PCA embedding --- such as the separation of non-neurons, inhibitory, and excitatory neurons --- while also capturing the local cell-type structure that is present in the $t$-SNE embedding.
  • Figure 2: Embeddings of the tasic2018shared dataset. a: Spectrum of DREAMS embeddings for different values of regularization strength $\lambda$. b--e: Embeddings obtained by some of the competing methods. For all embeddings, see \ref{['tasic_embs']}.
  • Figure 3: Quantitative evaluation of local and global structure preservation of different methods across multiple datasets (\ref{['tab:data']}). Spearman correlation of pairwise distances (CPD, global metric) is plotted against $k$NN recall (KNN, local metric). For improved visual clarity different markers were used.
  • Figure 4: Trade-offs between local and global structure preservation in different methods. Spearman correlation of pairwise distances is plotted against $k$NN recall. a: Performance of DREAMS compared with other local-global spectra. b: Comparison of DREAMS-MDS and SQuadMDS-hybrid. c: Performance of different DREAMS variants. Panels a and c show results on the tasic2018shared dataset, while panel b is based on the packer2019lineage dataset.
  • Figure S1: Regularization error, KL divergence, scaling parameter $\alpha$ and embedding scale of DREAMS during optimization using different or no scaling methods. Results are reported as the mean over four random seeds on the tasic2018shared dataset.
  • ...and 15 more figures