Median Consensus Embedding for Dimensionality Reduction
Yui Tomo, Daisuke Yoneoka
TL;DR
This work introduces Median Consensus Embedding (MCE), a statistically grounded method to mitigate instability in nonlinear dimensionality reduction by aggregating multiple embeddings into a single robust consensus. By modeling embeddings in a quotient embedding space and using the geometric median with a distance derived from pairwise-distance matrices, MCE achieves exponential-rate consistency under a large-deviations framework. The authors provide a concrete algorithm via the Weiszfeld method and demonstrate rapid convergence, reduced variability, and compatibility with missing data (via multiple imputation) and multiscale hyperparameters on real biological datasets. Overall, MCE offers a principled, scalable approach to stable low-dimensional representations, with practical benefits for visualization, downstream analysis, and cross-parameter robustness.
Abstract
This study proposes median consensus embedding (MCE) to address variability in low-dimensional embeddings caused by random initialization in nonlinear dimensionality reduction techniques such as $t$-distributed stochastic neighbor embedding. MCE is defined as the geometric median of multiple embeddings. By assuming multiple embeddings as independent and identically distributed random samples and applying large deviation theory, we prove that MCE achieves consistency at an exponential rate. Furthermore, we develop a practical algorithm to implement MCE by constructing a distance function between embeddings based on the Frobenius norm of the pairwise distance matrix of data points. Application to actual data demonstrates that MCE converges rapidly and effectively reduces instability. We further combine MCE with multiple imputation to address missing values and consider multiscale hyperparameters. Results confirm that MCE effectively mitigates instability issues in embedding methods arising from random initialization and other sources.
