Table of Contents
Fetching ...

Median Consensus Embedding for Dimensionality Reduction

Yui Tomo, Daisuke Yoneoka

TL;DR

This work introduces Median Consensus Embedding (MCE), a statistically grounded method to mitigate instability in nonlinear dimensionality reduction by aggregating multiple embeddings into a single robust consensus. By modeling embeddings in a quotient embedding space and using the geometric median with a distance derived from pairwise-distance matrices, MCE achieves exponential-rate consistency under a large-deviations framework. The authors provide a concrete algorithm via the Weiszfeld method and demonstrate rapid convergence, reduced variability, and compatibility with missing data (via multiple imputation) and multiscale hyperparameters on real biological datasets. Overall, MCE offers a principled, scalable approach to stable low-dimensional representations, with practical benefits for visualization, downstream analysis, and cross-parameter robustness.

Abstract

This study proposes median consensus embedding (MCE) to address variability in low-dimensional embeddings caused by random initialization in nonlinear dimensionality reduction techniques such as $t$-distributed stochastic neighbor embedding. MCE is defined as the geometric median of multiple embeddings. By assuming multiple embeddings as independent and identically distributed random samples and applying large deviation theory, we prove that MCE achieves consistency at an exponential rate. Furthermore, we develop a practical algorithm to implement MCE by constructing a distance function between embeddings based on the Frobenius norm of the pairwise distance matrix of data points. Application to actual data demonstrates that MCE converges rapidly and effectively reduces instability. We further combine MCE with multiple imputation to address missing values and consider multiscale hyperparameters. Results confirm that MCE effectively mitigates instability issues in embedding methods arising from random initialization and other sources.

Median Consensus Embedding for Dimensionality Reduction

TL;DR

This work introduces Median Consensus Embedding (MCE), a statistically grounded method to mitigate instability in nonlinear dimensionality reduction by aggregating multiple embeddings into a single robust consensus. By modeling embeddings in a quotient embedding space and using the geometric median with a distance derived from pairwise-distance matrices, MCE achieves exponential-rate consistency under a large-deviations framework. The authors provide a concrete algorithm via the Weiszfeld method and demonstrate rapid convergence, reduced variability, and compatibility with missing data (via multiple imputation) and multiscale hyperparameters on real biological datasets. Overall, MCE offers a principled, scalable approach to stable low-dimensional representations, with practical benefits for visualization, downstream analysis, and cross-parameter robustness.

Abstract

This study proposes median consensus embedding (MCE) to address variability in low-dimensional embeddings caused by random initialization in nonlinear dimensionality reduction techniques such as -distributed stochastic neighbor embedding. MCE is defined as the geometric median of multiple embeddings. By assuming multiple embeddings as independent and identically distributed random samples and applying large deviation theory, we prove that MCE achieves consistency at an exponential rate. Furthermore, we develop a practical algorithm to implement MCE by constructing a distance function between embeddings based on the Frobenius norm of the pairwise distance matrix of data points. Application to actual data demonstrates that MCE converges rapidly and effectively reduces instability. We further combine MCE with multiple imputation to address missing values and consider multiscale hyperparameters. Results confirm that MCE effectively mitigates instability issues in embedding methods arising from random initialization and other sources.

Paper Structure

This paper contains 28 sections, 4 theorems, 57 equations, 4 figures, 1 algorithm.

Key Result

Theorem 3.1

Suppose that Assumptions assump:trueembedding and assump:mgf are satisfied, then for any $\epsilon > 0$, there exist $M \in \mathbb{N}$, $K>0$, and $\eta > 0$ such that if $m > M$, then

Figures (4)

  • Figure 1: Visualization of the embedding of the datasets obtained by the MCE with $1000$ embeddings. The left column (a) shows the results obtained using ToxoLopit data and $t$-SNE, and the right column (b) shows the results obtained using Embryoid body data and UMAP.
  • Figure 2: Mean distance to $\hat{y}_{1000}$ and mean pairwise distance among embeddings for $t$-SNE embeddings and MCE embeddings ($m=2,\,10,\,20,\,50,\,100$). The results are shown with error bars indicating standard deviations (SD). The left column (a) shows the results obtained using ToxoLopit data and $t$-SNE, and the right column (b) shows the results obtained using Embryoid body data and UMAP.
  • Figure 3: Visualization of the embeddings of the combined approach with multiple imputation applied to ToxoLopit data. (a) Random missing scenario with $10\%$ missing rate. (b) Intensity-dependent missing scenario with $10\%$ missing rate. (c) Random missing scenario with $30\%$ missing rate. (d) Intensity-dependent missing scenario with $30\%$ missing rate.
  • Figure 4: Visualization of the embeddings by $t$-SNE with multiple perplexity values and the MCE applied to ToxoLopit data. (a) MCE of the specified perplexity settings. (b) Embedding with $\text{Perplexity}=10$. (c) Embedding with $\text{Perplexity}=30$. (d) Embedding with $\text{Perplexity}=90$. (e) Embedding with $\text{Perplexity}=270$.

Theorems & Definitions (11)

  • Remark 3.1
  • Theorem 3.1: Consistency with exponential rate
  • Remark 4.1
  • Proposition 4.1: Construction of distance function on $\tilde{\mathcal{Y}}$
  • Proposition 4.2: Equivalence of optimization problems
  • Remark 5.1
  • Lemma A.1
  • proof
  • proof : proof of Theorem \ref{['thm:main']}
  • proof : Proof of Proposition \ref{['prop:distance_x']}
  • ...and 1 more