Table of Contents
Fetching ...

Consensus dimension reduction via multi-view learning

Bingxue An, Tiffany M. Tang

TL;DR

The paper tackles the problem that different dimension reduction visualizations can diverge due to method choice and hyperparameters. It introduces a consensus framework, CoMDS, and its local variant LoCoMDS, that fuse multiple DR outputs by converting embeddings to distance matrices and solving a multi-view MDS objective to extract shared structure. The approach yields embeddings that balance global and local structure preservation, demonstrate robustness to method and parameter choice, and improve interpretability in real datasets (e.g., single-cell and olive oil data). The authors provide theoretical connections to INDSCAL/three-way MDS, practical tuning via adjusted LCMC, and an open-source R package, highlighting the framework’s potential to make low-dimensional visualizations more trustworthy and reproducible across diverse scientific domains.

Abstract

A plethora of dimension reduction methods have been developed to visualize high-dimensional data in low dimensions. However, different dimension reduction methods often output different and possibly conflicting visualizations of the same data. This problem is further exacerbated by the choice of hyperparameters, which may substantially impact the resulting visualization. To obtain a more robust and trustworthy dimension reduction output, we advocate for a consensus approach, which summarizes multiple visualizations into a single consensus dimension reduction visualization. Here, we leverage ideas from multi-view learning in order to identify the patterns that are most stable or shared across the many different dimension reduction visualizations, or views, and subsequently visualize this shared structure in a single low-dimensional plot. We demonstrate that this consensus visualization effectively identifies and preserves the shared low-dimensional data structure through both simulated and real-world case studies. We further highlight our method's robustness to the choice of dimension reduction method and hyperparameters -- a highly-desirable property when working towards trustworthy and reproducible data science.

Consensus dimension reduction via multi-view learning

TL;DR

The paper tackles the problem that different dimension reduction visualizations can diverge due to method choice and hyperparameters. It introduces a consensus framework, CoMDS, and its local variant LoCoMDS, that fuse multiple DR outputs by converting embeddings to distance matrices and solving a multi-view MDS objective to extract shared structure. The approach yields embeddings that balance global and local structure preservation, demonstrate robustness to method and parameter choice, and improve interpretability in real datasets (e.g., single-cell and olive oil data). The authors provide theoretical connections to INDSCAL/three-way MDS, practical tuning via adjusted LCMC, and an open-source R package, highlighting the framework’s potential to make low-dimensional visualizations more trustworthy and reproducible across diverse scientific domains.

Abstract

A plethora of dimension reduction methods have been developed to visualize high-dimensional data in low dimensions. However, different dimension reduction methods often output different and possibly conflicting visualizations of the same data. This problem is further exacerbated by the choice of hyperparameters, which may substantially impact the resulting visualization. To obtain a more robust and trustworthy dimension reduction output, we advocate for a consensus approach, which summarizes multiple visualizations into a single consensus dimension reduction visualization. Here, we leverage ideas from multi-view learning in order to identify the patterns that are most stable or shared across the many different dimension reduction visualizations, or views, and subsequently visualize this shared structure in a single low-dimensional plot. We demonstrate that this consensus visualization effectively identifies and preserves the shared low-dimensional data structure through both simulated and real-world case studies. We further highlight our method's robustness to the choice of dimension reduction method and hyperparameters -- a highly-desirable property when working towards trustworthy and reproducible data science.

Paper Structure

This paper contains 56 sections, 22 equations, 27 figures, 5 tables.

Figures (27)

  • Figure 1: Comparison of different dimension reduction methods applied to single-cell RNA sequencing data from peripheral blood mononuclear cells, collected from individuals with HIV infection Kazer, revealing visually different representations.
  • Figure 2: Overview of consensus dimension reduction framework. Given a set of candidate dimension reduction methods, we compute the pairwise distances in each dimension reduction space and subsequently extract the shared patterns across these distances to obtain our consensus dimension reduction embedding.
  • Figure 3: (A) Simulated mixture of Gaussians data. (B) Low-dimensional embeddings obtained from a subset of dimension reduction inputs, CoMDS, LoCoMDS, and other existing consensus dimension reduction approaches, applied to the mixture of Gaussians data.
  • Figure 4: (A) Simulated Swiss roll data. (B) Low-dimensional embeddings obtained from a subset of dimension reduction inputs, CoMDS, LoCoMDS, and other existing consensus dimension reduction approaches, applied to the Swiss roll data.
  • Figure 5: Low-dimensional embeddings obtained from a subset of dimension reduction inputs, CoMDS, LoCoMDS, and other existing consensus dimension reduction approaches, applied to the olive oil dataset.
  • ...and 22 more figures