Table of Contents
Fetching ...

CliPS -- How to identify cluster distributions in Bayesian mixture models

Gertraud Malsiner-Walli, Sylvia Frühwirth-Schnatter, Bettina Grün

TL;DR

The CliPS procedure is proposed when fitting Bayesian mixture models in the context of model-based clustering to identify the cluster distributions while simultaneously assessing the suitability of a cluster solution and validating the cluster structure.

Abstract

We propose the CliPS procedure when fitting Bayesian mixture models in the context of model-based clustering to identify the cluster distributions while simultaneously assessing the suitability of a cluster solution and validating the cluster structure. The procedure relies on the point process representation of a mixture model and is based on the assumption that a suitable cluster solution requires the clusters to be distinguishable with respect to a low-dimensional functional of the component-specific parameters of the mixture. CliPS maps the component-specific MCMC draws to the point process representation and identifies clusters there, exploiting that, while data distributions usually overlap, the posterior of these functionals are more and more separated for increasing sample size. We outline the procedure and illustrate its use on several model-based clustering examples.

CliPS -- How to identify cluster distributions in Bayesian mixture models

TL;DR

The CliPS procedure is proposed when fitting Bayesian mixture models in the context of model-based clustering to identify the cluster distributions while simultaneously assessing the suitability of a cluster solution and validating the cluster structure.

Abstract

We propose the CliPS procedure when fitting Bayesian mixture models in the context of model-based clustering to identify the cluster distributions while simultaneously assessing the suitability of a cluster solution and validating the cluster structure. The procedure relies on the point process representation of a mixture model and is based on the assumption that a suitable cluster solution requires the clusters to be distinguishable with respect to a low-dimensional functional of the component-specific parameters of the mixture. CliPS maps the component-specific MCMC draws to the point process representation and identifies clusters there, exploiting that, while data distributions usually overlap, the posterior of these functionals are more and more separated for increasing sample size. We outline the procedure and illustrate its use on several model-based clustering examples.
Paper Structure (21 sections, 13 equations, 12 figures, 1 table)

This paper contains 21 sections, 13 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: Left: PPR of a univariate Gaussian mixture distribution with $K = 3$ components. Right: PPR of the MCMC draws obtained when fitting a Gaussian mixture distribution with $K = 3$ components to 500 observations simulated from the mixture on the left.
  • Figure 2: Example. Left: pairwise PPR of a six-dimensional Gaussian mixture model with $K = 4$ clusters using the mean parameters in the various dimensions for visualization. Right: corresponding pairwise scatter plots of 1,000 observations drawn from the mixture model on the left.
  • Figure 3: Example 2 ($K = 4$ known). Pairwise PPR of the MCMC draws using only the mean parameters in the various dimensions in gray together with the component means of the true mixture model indicated by black bullets.
  • Figure 4: Example 3 ($K$ unknown). Pairwise PPR s of the MCMC draws using the mean parameters in the various dimensions obtained for a Bayesian mixture of finite mixtures model. Left: all MCMC draws including all components. Right: MCMC draws where $\hat{K}_+ = 4$ components are filled and retaining only filled components, with draws colored according to the labeling obtained with CliPS.
  • Figure 5: Diabetes data. Left: pairwise scatter plots of the data colored by known classification. Right, top: trace plot of $K$ (gray) and $K_+$ (black) for the recorded iterations. Right, bottom: trace plot of $\eta_k$ for $k =1, \ldots, K$.
  • ...and 7 more figures

Theorems & Definitions (3)

  • Example 1: Illustrative example
  • Example 2: Known number of clusters
  • Example 3: Unknown number of clusters