Table of Contents
Fetching ...

A Bayesian approach to learning mixtures of nonparametric components

Yilei Zhang, Yun Wei, Aritra Guha, XuanLong Nguyen

TL;DR

This paper develops an efficient MCMC algorithm for posterior inference and demonstrates via simulation studies and real-world data illustrations that it is possible to efficiently learn complex forms of probability distribution for the latent subpopulations.

Abstract

Mixture models are widely used in modeling heterogeneous data populations. A standard approach of mixture modeling assumes that the mixture component takes a parametric kernel form. In many applications, making parametric assumptions on the latent subpopulation distributions may be unrealistic, which motivates the need for nonparametric modeling of the mixture components themselves. In this paper, we study finite mixtures with nonparametric mixture components, using a Bayesian nonparametric modeling approach. In particular, it is assumed that the data population is generated according to a finite mixture of latent component distributions, where each component is endowed with a Bayesian nonparametric prior such as the Dirichlet process mixture. We present conditions under which the individual mixture component's distribution can be identified, and establish posterior contraction behavior for the data population's density, as well as densities of the latent mixture components. We develop an efficient MCMC algorithm for posterior inference and demonstrate via simulation studies and real-world data illustrations that it is possible to efficiently learn complex forms of probability distribution for the latent subpopulations. In theory, the posterior contraction rate of the component densities is nearly polynomial, which is a significant improvement over the logarithmic convergence rates of estimating mixing measures via deconvolution.

A Bayesian approach to learning mixtures of nonparametric components

TL;DR

This paper develops an efficient MCMC algorithm for posterior inference and demonstrates via simulation studies and real-world data illustrations that it is possible to efficiently learn complex forms of probability distribution for the latent subpopulations.

Abstract

Mixture models are widely used in modeling heterogeneous data populations. A standard approach of mixture modeling assumes that the mixture component takes a parametric kernel form. In many applications, making parametric assumptions on the latent subpopulation distributions may be unrealistic, which motivates the need for nonparametric modeling of the mixture components themselves. In this paper, we study finite mixtures with nonparametric mixture components, using a Bayesian nonparametric modeling approach. In particular, it is assumed that the data population is generated according to a finite mixture of latent component distributions, where each component is endowed with a Bayesian nonparametric prior such as the Dirichlet process mixture. We present conditions under which the individual mixture component's distribution can be identified, and establish posterior contraction behavior for the data population's density, as well as densities of the latent mixture components. We develop an efficient MCMC algorithm for posterior inference and demonstrate via simulation studies and real-world data illustrations that it is possible to efficiently learn complex forms of probability distribution for the latent subpopulations. In theory, the posterior contraction rate of the component densities is nearly polynomial, which is a significant improvement over the logarithmic convergence rates of estimating mixing measures via deconvolution.

Paper Structure

This paper contains 31 sections, 24 theorems, 256 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Theorem 6.1

For any $f$ as defined in (interval_location_mixture) such that C1 or C2 is satisfied, there exist a unique $\sigma$, a unique collection of densities $f_{1} ,\cdots ,f_{K}$, and a corresponding weight vector $(w_1,\cdots, w_K)\in \Delta^{K-1}$ that satisfy the conditions in (interval_location_mixtu

Figures (6)

  • Figure 1: Graphical model representation of the MDPM \ref{['hierarchical_model']} for $K=2$. The dashed edges among $c_1,r_1,c_2,r_2$ represent the repulsive prior on $(\boldsymbol{c},\boldsymbol{r})$, which enforces disjoint intervals $I_1$ and $I_2$. For $i=1,2$, the base measure $H_{i0}(u,\sigma)$ is supported on $I_i\times(0,\infty)$. The dashed edge between $w_1$ and $w_2$ indicates the truncated Beta prior (equivalently, the two-parameter truncated Dirichlet) on the mixture weights $\boldsymbol{w}=(w_1,w_2)$. Finally, the mixture weights $\boldsymbol{w}=(w_1,w_2)$ combine the component distributions into the overall mixture $F=w_1G_1+w_2G_2$.
  • Figure 2: Example 1: a three-component mixture with the separation condition imposed on the location parameter $u$; $f_1$ is generated from a random combination of Hermite functions; $f_2$ is a skewed exponential-power distribution, and $f_3$ is a Laplace distribution. Example 2: a two component mixture with the separation condition imposed on the scale parameter $\sigma$; Both component densities are generated from random combinations of Hermite functions.
  • Figure 3: A bivariate two-component mixture model. Each component is a Gaussian mixture located on a circle with random covariance matrices. The left panel displays the true density, while the middle and right panels show the pointwise posterior means of the fitted component densities, weighted by their respective posterior mixture weights.
  • Figure 4: Density estimation contours from MDPM, KDE, and mixture of King's profiles, respectively.
  • Figure 5: Comparison of CDFs.
  • ...and 1 more figures

Theorems & Definitions (48)

  • Definition 2.1: Connectedness
  • Theorem 6.1
  • Theorem 7.1
  • Theorem 7.2
  • Remark
  • Lemma 7.1
  • proof : Proof of Theorem \ref{['Theorem:identifiability']}
  • Lemma D.1
  • proof
  • Lemma D.2
  • ...and 38 more