Table of Contents
Fetching ...

Generalized Bayesian Multidimensional Scaling and Model Comparison

Jiarui Zhang, Jiguo Cao, Liangliang Wang

TL;DR

A Generalized Bayesian Multidimensional Scaling (GBMDS) framework is proposed that incorporates flexible dissimilarity metrics and robust non-Gaussian error structures, prioritizing uncertainty quantification, robustness, and model flexibility, and an adaptive annealed Sequential Monte Carlo (ASMC) algorithm is designed.

Abstract

Multidimensional scaling (MDS) is widely used to reconstruct a low-dimensional representation of high-dimensional data while preserving pairwise distances. However, Bayesian MDS approaches based on Markov chain Monte Carlo (MCMC) face challenges in model generalization and comparison. To address these limitations, we propose a generalized Bayesian multidimensional scaling (GBMDS) framework that accommodates non-Gaussian errors and diverse dissimilarity metrics for improved robustness. We develop an adaptive annealed Sequential Monte Carlo (ASMC) algorithm for Bayesian inference, leveraging an annealing schedule to enhance posterior exploration and computational efficiency. The ASMC algorithm also provides a nearly unbiased marginal likelihood estimator, enabling principled Bayesian model comparison across different error distributions, dissimilarity metrics, and dimensional choices. Using synthetic and real data, we demonstrate the effectiveness of the proposed approach. Our results show that ASMC-based GBMDS achieves superior computational efficiency and robustness compared to MCMC-based methods under the same computational budget. The implementation of our proposed method and applications are available at https://github.com/SFU-Stat-ML/GBMDS.

Generalized Bayesian Multidimensional Scaling and Model Comparison

TL;DR

A Generalized Bayesian Multidimensional Scaling (GBMDS) framework is proposed that incorporates flexible dissimilarity metrics and robust non-Gaussian error structures, prioritizing uncertainty quantification, robustness, and model flexibility, and an adaptive annealed Sequential Monte Carlo (ASMC) algorithm is designed.

Abstract

Multidimensional scaling (MDS) is widely used to reconstruct a low-dimensional representation of high-dimensional data while preserving pairwise distances. However, Bayesian MDS approaches based on Markov chain Monte Carlo (MCMC) face challenges in model generalization and comparison. To address these limitations, we propose a generalized Bayesian multidimensional scaling (GBMDS) framework that accommodates non-Gaussian errors and diverse dissimilarity metrics for improved robustness. We develop an adaptive annealed Sequential Monte Carlo (ASMC) algorithm for Bayesian inference, leveraging an annealing schedule to enhance posterior exploration and computational efficiency. The ASMC algorithm also provides a nearly unbiased marginal likelihood estimator, enabling principled Bayesian model comparison across different error distributions, dissimilarity metrics, and dimensional choices. Using synthetic and real data, we demonstrate the effectiveness of the proposed approach. Our results show that ASMC-based GBMDS achieves superior computational efficiency and robustness compared to MCMC-based methods under the same computational budget. The implementation of our proposed method and applications are available at https://github.com/SFU-Stat-ML/GBMDS.
Paper Structure (26 sections, 27 equations, 6 figures, 4 tables, 3 algorithms)

This paper contains 26 sections, 27 equations, 6 figures, 4 tables, 3 algorithms.

Figures (6)

  • Figure 1: An illustration of the batch split.
  • Figure 2: (a) The histogram of the errors with 300 observations. (b) The boxplots of computation times for different models. The computation budget is kept consistent across all comparisons between pairs of Bayesian methods. (c) The boxplots of the log marginal likelihood for different models.
  • Figure 3: (a) The histograms of the dissimilarity $d_{i,j}$ under Euclidean metrics. The left histogram is from scenario 1, where the data contain 5% outliers. The right histogram is from scenario 2, where the data contain 15% outliers. (b) The boxplots of the log marginal likelihood for different models. Red: $\mathcal{M}_{TSN}^{\text{Euclidean}}$; Blue: $\mathcal{M}_{TT}^{\text{Euclidean}}$. Dimension $p$ is 2. (c) The boxplots of computation times for different models. The computation budget is kept consistent across all comparisons between pairs of Bayesian methods.
  • Figure 4: (a) The histogram of Cosine dissimilarity. Counts on $Y$-axis are in log scale. (b) The log marginal likelihood for different models under varying dimensions.
  • Figure 5: Estimated coordinates of the abstracts obtained using GBMDS-ASMC with $p=7$. The lower panel displays pairwise scatter plots, the diagonal shows density plots, and the upper panel provides contour plots of the density. The colors indicate three distinct clusters.
  • ...and 1 more figures