Consensus dimension reduction via multi-view learning
Bingxue An, Tiffany M. Tang
TL;DR
The paper tackles the problem that different dimension reduction visualizations can diverge due to method choice and hyperparameters. It introduces a consensus framework, CoMDS, and its local variant LoCoMDS, that fuse multiple DR outputs by converting embeddings to distance matrices and solving a multi-view MDS objective to extract shared structure. The approach yields embeddings that balance global and local structure preservation, demonstrate robustness to method and parameter choice, and improve interpretability in real datasets (e.g., single-cell and olive oil data). The authors provide theoretical connections to INDSCAL/three-way MDS, practical tuning via adjusted LCMC, and an open-source R package, highlighting the framework’s potential to make low-dimensional visualizations more trustworthy and reproducible across diverse scientific domains.
Abstract
A plethora of dimension reduction methods have been developed to visualize high-dimensional data in low dimensions. However, different dimension reduction methods often output different and possibly conflicting visualizations of the same data. This problem is further exacerbated by the choice of hyperparameters, which may substantially impact the resulting visualization. To obtain a more robust and trustworthy dimension reduction output, we advocate for a consensus approach, which summarizes multiple visualizations into a single consensus dimension reduction visualization. Here, we leverage ideas from multi-view learning in order to identify the patterns that are most stable or shared across the many different dimension reduction visualizations, or views, and subsequently visualize this shared structure in a single low-dimensional plot. We demonstrate that this consensus visualization effectively identifies and preserves the shared low-dimensional data structure through both simulated and real-world case studies. We further highlight our method's robustness to the choice of dimension reduction method and hyperparameters -- a highly-desirable property when working towards trustworthy and reproducible data science.
