Table of Contents
Fetching ...

Understanding uncertainty in Bayesian cluster analysis

Cecilia Balocchi, Sara Wade

TL;DR

The paper tackles the difficulty of interpreting the posterior over the discrete, high-dimensional space of partitions in Bayesian clustering. It introduces WASABI, a Wasserstein-based approach that summarizes the posterior with a small number of weighted clustering partitions (particles), found by minimizing the Wasserstein distance $W_{ ext{VI}}$ in the partition space endowed with the VI metric. A k-medoids-like procedure alternates between assigning samples to centers and updating centers to minimize the posterior expected VI, yielding interpretable centers, weights, and region-specific uncertainty tools (PSMs, VIC/VICG, EVIC, EVI). The method is demonstrated on synthetic data and real analyses (density regression for HPV uptake and single-neuron projection motifs), showing improved understanding of multimodal posteriors and robustness to misspecification, with an accompanying R package. WASABI thus provides a practical framework for communicating and quantifying uncertainty in Bayesian clustering and can be extended to other latent-variable models and losses.

Abstract

The Bayesian approach to clustering is often appreciated for its ability to provide uncertainty in the partition structure. However, summarizing the posterior distribution over the clustering structure can be challenging, due the discrete, unordered nature and massive dimension of the space. While recent advancements provide a single clustering estimate to represent the posterior, this ignores uncertainty and may even be unrepresentative in instances where the posterior is multimodal. To enhance our understanding of uncertainty, we propose a WASserstein Approximation for Bayesian clusterIng (WASABI), which summarizes the posterior samples with not one, but multiple clustering estimates, each corresponding to a different part of the partition space that receives substantial posterior mass. Specifically, we find such clustering estimates by approximating the posterior distribution in a Wasserstein distance sense, equipped with a suitable metric on the partition space. An interesting byproduct is that a locally optimal solution can be found using a k-medoids-like algorithm on the partition space to divide the posterior samples into groups, each represented by one of the clustering estimates. Using synthetic and real datasets, we show that WASABI helps to improve the understanding of uncertainty, particularly when clusters are not well separated or when the employed model is misspecified.

Understanding uncertainty in Bayesian cluster analysis

TL;DR

The paper tackles the difficulty of interpreting the posterior over the discrete, high-dimensional space of partitions in Bayesian clustering. It introduces WASABI, a Wasserstein-based approach that summarizes the posterior with a small number of weighted clustering partitions (particles), found by minimizing the Wasserstein distance in the partition space endowed with the VI metric. A k-medoids-like procedure alternates between assigning samples to centers and updating centers to minimize the posterior expected VI, yielding interpretable centers, weights, and region-specific uncertainty tools (PSMs, VIC/VICG, EVIC, EVI). The method is demonstrated on synthetic data and real analyses (density regression for HPV uptake and single-neuron projection motifs), showing improved understanding of multimodal posteriors and robustness to misspecification, with an accompanying R package. WASABI thus provides a practical framework for communicating and quantifying uncertainty in Bayesian clustering and can be extended to other latent-variable models and losses.

Abstract

The Bayesian approach to clustering is often appreciated for its ability to provide uncertainty in the partition structure. However, summarizing the posterior distribution over the clustering structure can be challenging, due the discrete, unordered nature and massive dimension of the space. While recent advancements provide a single clustering estimate to represent the posterior, this ignores uncertainty and may even be unrepresentative in instances where the posterior is multimodal. To enhance our understanding of uncertainty, we propose a WASserstein Approximation for Bayesian clusterIng (WASABI), which summarizes the posterior samples with not one, but multiple clustering estimates, each corresponding to a different part of the partition space that receives substantial posterior mass. Specifically, we find such clustering estimates by approximating the posterior distribution in a Wasserstein distance sense, equipped with a suitable metric on the partition space. An interesting byproduct is that a locally optimal solution can be found using a k-medoids-like algorithm on the partition space to divide the posterior samples into groups, each represented by one of the clustering estimates. Using synthetic and real datasets, we show that WASABI helps to improve the understanding of uncertainty, particularly when clusters are not well separated or when the employed model is misspecified.

Paper Structure

This paper contains 32 sections, 2 theorems, 22 equations, 20 figures, 2 tables, 2 algorithms.

Key Result

Proposition 1

The WASABI posterior $q^*$, which is the solution to eq:WassOptimization, is found by identifying a set of "centers" or "particles" $\bm{\rho}^* = \{ \rho^*_1, \ldots, \rho^*_L \}$ that minimize where $\mathcal{N}_\ell = \{ \rho : d_{\text{VI}}(\rho,\rho^*_\ell) < d_{\text{VI}}(\rho, \rho^*_{\ell'}) \textrm{ for all } \ell' \neq \ell \}$ corresponds to the set of partitions that are closer to $\r

Figures (20)

  • Figure 1: Slightly bimodal example. The MAP partition of a DPM for data generated from a slightly bimodal mixture (Figure \ref{['fig:data_raj3']}) contains only a single cluster, yet the posterior similarity matrix (Figure \ref{['fig:psm_raj3']}) suggests uncertainty in additional clusters. To further examine the posterior on the space of partitions, Figure \ref{['fig:maps_raj3']} shows the log posterior relative to the MAP against the VI distance to the MAP, where each point represents a partition colored by its number of clusters, with cluster sizes reported for some (the set of partitions plotted consists of the MAP and those with 2 clusters that respect the order of the observed data). Multimodality in the posterior is evident with one mode corresponding to the MAP and another around partitions with two clusters of more equal size. WASABI summarizes with multiple, weighted partitions (Figure \ref{['fig:wasabi_raj3']}), reflecting the two different modes of clustering in this example.
  • Figure 2: Two-dimensional extension of the bimodal example. Data is generated from a Gaussian mixture with four components (Panel \ref{['fig:4modes_scatter']}). The minVI merges three components (Panel \ref{['fig:4modes_minEVI']}). To choose the number of particles $L$ in WASABI, we construct an elbow plot (Panel \ref{['fig:4modes_elbow']}) which suggests $L = 3$ particles achieves a balance between parsimony and minimizing the objective.
  • Figure 3: Several possible visualizations of the WASABI summary for Example \ref{['ex:4modes2d']}: (a) scatterplot of the data colored by particles' cluster assignment, (b) posterior similarity matrix for the samples within each region of attraction, (c) the particles' meet, (d) posterior similarity matrix using the WASABI approximation collapsed to meet's clusters, (e) comparison of two particles using VI contribution.
  • Figure 4: Experiment with different levels of cluster separation, determined by $m$. Left panel: posterior spread around the minVI estimator, quantified by its EVI. Right panel: improvement in Wasserstein distance achieved by WASABI for different values of $m$; the points are colored by the number of clusters in the corresponding minVI estimate.
  • Figure 5: Experiment with misspecified models. The top two rows show results for truncated Gaussian and skewed-t ($df=5$) distributions. Left: improvement in Wasserstein distance by WASABI versus posterior spread (EVI); central/right: particle scatterplots with cluster assignments and a summary plot, for one simulated dataset. The bottom row shows Wasserstein improvement for the skewed-t with varying degrees of freedom.
  • ...and 15 more figures

Theorems & Definitions (18)

  • Definition 1
  • Remark
  • Proposition 1
  • Definition 2
  • Proposition 2
  • Example 1
  • Example 2
  • Example 2: continued
  • Example 2: continued
  • Definition 3: VI contribution
  • ...and 8 more