Table of Contents
Fetching ...

Blocked Gibbs Sampling for Improved Convergence in Finite Mixture Models

David Michael Swanson

Abstract

Gibbs sampling is a common procedure used to fit finite mixture models. However, it is known to be slow to converge when exploring correlated regions of a parameter space and so blocking correlated parameters is sometimes implemented in practice. This is straightforward to visualize in contexts like low-dimensional multivariate Gaussian distributions, but more difficult for mixture models because of the way latent variable assignment and cluster-specific parameters influence one another. Here we analyze correlation in the space of latent variables and show that latent variables of outlier observations equidistant between component distributions can exhibit significant correlation that is not bounded away from one, suggesting they can converge very slowly to their stationary distribution. We provide bounds on convergence rates to a modification of the stationary distribution and propose a blocked sampling procedure that significantly reduces autocorrelation in the latent variable Markov chain, which we demonstrate in simulation.

Blocked Gibbs Sampling for Improved Convergence in Finite Mixture Models

Abstract

Gibbs sampling is a common procedure used to fit finite mixture models. However, it is known to be slow to converge when exploring correlated regions of a parameter space and so blocking correlated parameters is sometimes implemented in practice. This is straightforward to visualize in contexts like low-dimensional multivariate Gaussian distributions, but more difficult for mixture models because of the way latent variable assignment and cluster-specific parameters influence one another. Here we analyze correlation in the space of latent variables and show that latent variables of outlier observations equidistant between component distributions can exhibit significant correlation that is not bounded away from one, suggesting they can converge very slowly to their stationary distribution. We provide bounds on convergence rates to a modification of the stationary distribution and propose a blocked sampling procedure that significantly reduces autocorrelation in the latent variable Markov chain, which we demonstrate in simulation.

Paper Structure

This paper contains 12 sections, 6 theorems, 59 equations, 7 figures.

Key Result

Theorem 1

Consider the joint distribution of $C_i$ and $C_j$ given the complementary allocation ${\pmb C}_{\backslash i,j}$ defined above with probabilities $p_{11},\, p_{12},\, p_{21},\, p_{22}$. Provided $S_{{\{k \backslash b \}}}$ are non-singular, $n_{k\backslash b}>0$, $k=1,2$, then as $d_k \rightarrow \

Figures (7)

  • Figure 1: A surface plot of correlation between $C_i$ and $C_j$, latent variables of $Y_i$ and $Y_j$ which are of equal value, as a function of their placement between two component means. The two axes are the distances from $Y_i=Y_j$ to the two component means. The increasing ridge along the diagonal represents $Y_i$ and $Y_j$ equidistant between the two components and moving increasingly into their tails.
  • Figure 2: Two perspectives on the same correlation surface of $C_i$ and $C_j$, latent variables of $Y_i$ and $Y_j$, as a function of their placement between two components. The two axes are the distances from $Y_i$ and $Y_j$ to one of those component means, with the distance between the two components held constant. The diagonal across the top of the surface (that closest to the silhouette of the surface) corresponds to $Y_i$ and $Y_j$ of equal value, while the other diagonal (that going from front to back of the surface) corresponds to $Y_i$ and $Y_j$ mirroring one another across the cluster means' midpoint.
  • Figure 3: Contour plot of a mixture density composed of three Gaussian clusters or components, with an observation each between the two pair of adjacent components, marked with red X's. The latent variables of these observations would negatively correlate with respect to cluster 2 because they "pull" in different directions, but behave independently with respect to allocation to clusters 1 or 3, respectively.
  • Figure 4: Contour plot of a mixture density composed of three Gaussian clusters or components, with an observation each midway and above the two pair of adjacent components, marked with red X's. The latent variables of these observations would correlate with respect to cluster 2, but behave independently with respect to allocation to clusters 1 or 3, respectively.
  • Figure 5: Posterior similarity matrices (PSMs) under the two different sampling regimes when the outlier block is equidistant between two components. Observations are in the same order along the x- and y-axes, where red indicates higher values of estimated $P(C_i=C_j)$ for PSM entry $i,j$ in posterior samples, and yellow indicates lower values.
  • ...and 2 more figures

Theorems & Definitions (12)

  • Theorem 1
  • Corollary 1.1
  • Lemma 2
  • Lemma 3
  • proof
  • Theorem 4
  • proof
  • Theorem 5
  • proof : Proof of Theorem \ref{['theorem_cor']}
  • proof : Proof of Corollary \ref{['corollary_cor']}
  • ...and 2 more