Table of Contents
Fetching ...

Bayesian Repulsive Mixture Modeling with Matérn Point Processes

Hanxi Sun, Boqian Zhang, Minhyeok Kim, Vinayak Rao

TL;DR

This work proposes a Bayesian mixture model with repulsion between mixture components, and introduces repulsion via a generalized Mat\'ern type-III repulsive point process model, and proceeds by applying a dependent sequential thinning scheme to a latent Poisson point process.

Abstract

Mixture models are a standard tool in statistical analyses, widely used for density modeling and model-based clustering. In this work, we propose a Bayesian mixture model with repulsion between mixture components. Such repulsion helps address the problem of overlapping or poorly separated clusters, and assists with model interpretibility and robustness. Our modeling approach introduces repulsion via a generalized Matérn type-III repulsive point process model, and proceeds by applying a dependent sequential thinning scheme to a latent Poisson point process. A key feature of our model is that in contrast to most existing approaches to modeling repulsion, efficient posterior inference is possible via a Gibbs sampler, one that exploits the latent Poisson of our problem. This novel sampler also allows posterior inference over the number of clusters, and is of independent interest even in standard clustering applications without repulsion. We demonstrate the utility of the proposed method on a number of synthetic and real-world problems.

Bayesian Repulsive Mixture Modeling with Matérn Point Processes

TL;DR

This work proposes a Bayesian mixture model with repulsion between mixture components, and introduces repulsion via a generalized Mat\'ern type-III repulsive point process model, and proceeds by applying a dependent sequential thinning scheme to a latent Poisson point process.

Abstract

Mixture models are a standard tool in statistical analyses, widely used for density modeling and model-based clustering. In this work, we propose a Bayesian mixture model with repulsion between mixture components. Such repulsion helps address the problem of overlapping or poorly separated clusters, and assists with model interpretibility and robustness. Our modeling approach introduces repulsion via a generalized Matérn type-III repulsive point process model, and proceeds by applying a dependent sequential thinning scheme to a latent Poisson point process. A key feature of our model is that in contrast to most existing approaches to modeling repulsion, efficient posterior inference is possible via a Gibbs sampler, one that exploits the latent Poisson of our problem. This novel sampler also allows posterior inference over the number of clusters, and is of independent interest even in standard clustering applications without repulsion. We demonstrate the utility of the proposed method on a number of synthetic and real-world problems.
Paper Structure (27 sections, 5 theorems, 16 equations, 27 figures, 12 tables, 3 algorithms)

This paper contains 27 sections, 5 theorems, 16 equations, 27 figures, 12 tables, 3 algorithms.

Key Result

Theorem 3.1

Write $\mathscr{P}_\lambda$ for the law of a rate-$\lambda(\cdot)$ Poisson process on ${{\Theta}\times{\mathcal{W}}\times{\mathcal{T}}}\times \mathcal{M}$. Then the measure of the tuple ${\bm X}$, ${G}$, ${\widetilde{G}}$ has density with respect to ${d}x^n\times\mathscr{P}_\lambda$ given by

Figures (27)

  • Figure 1: The generative process of a one-dimensional hardcore Matérn process.
  • Figure 2: (Top) Posterior mean of number of clusters $\mathbb{E}\left[\,{C}\,\middle|\,{\bm X}\,\right]$, (Bottom) Difference between test likelihood under the posterior and the true model $M_0$, $\log p\left({\bm X}_{\text{test}}\,\middle|\,{\bm X}\right) - \ln p\left({\bm X}_{\text{test}}\,\middle|\,M_0\right)$.
  • Figure 3: Top: (Left) Scatterplot of data with true mixture density. (Middle) Kernel density estimate of pairwise distances (Right) $d_{\min,k}$ versus $k$. Bottom: Contour plot and cluster assignments of the bivariate data for hardcore MRMM.
  • Figure 4: Chicago crime data, with contours/component assignments of hardcore MRMM.
  • Figure 5: The Malate dehydrogenase protein data, plotted (Left) on a torus. (Right) as a Ramachandran plot, where the torus is flattened to 2-d.
  • ...and 22 more figures

Theorems & Definitions (8)

  • Theorem 3.1
  • Proposition 4.1
  • Theorem : \ref{['prop:X-G-Gt']}
  • proof
  • Lemma A.1
  • proof
  • Proposition : \ref{['prop:Gt']}
  • proof