Table of Contents
Fetching ...

Quasi-Bayes properties of a recursive procedure for mixtures

Sandra Fortini, Sonia Petrone

Abstract

Bayesian methods are often optimal, yet increasing pressure for fast computations, especially with streaming data, brings renewed interest in faster, possibly sub-optimal, solutions. The extent to which these algorithms approximate Bayesian solutions is a question of interest, but often unanswered. We propose a methodology to address this question in predictive settings, when the algorithm can be reinterpreted as a probabilistic predictive rule. We specifically develop the proposed methodology for a recursive procedure for online learning in nonparametric mixture models, often refereed to as Newton's algorithm. This algorithm is simple and fast; however, its approximation properties are unclear. By reinterpreting it as a predictive rule, we can show that it underlies a statistical model which is, asymptotically, a Bayesian, exchangeable mixture model. In this sense, the recursive rule provides a quasi-Bayes solution. While the algorithm only offers a point estimate, our clean statistical formulation allows us to provide the asymptotic posterior distribution and asymptotic credible intervals for the mixing distribution. Moreover, it gives insights for tuning the parameters, as we illustrate in simulation studies, and paves the way to extensions in various directions. Beyond mixture models, our approach can be applied to other predictive algorithms.

Quasi-Bayes properties of a recursive procedure for mixtures

Abstract

Bayesian methods are often optimal, yet increasing pressure for fast computations, especially with streaming data, brings renewed interest in faster, possibly sub-optimal, solutions. The extent to which these algorithms approximate Bayesian solutions is a question of interest, but often unanswered. We propose a methodology to address this question in predictive settings, when the algorithm can be reinterpreted as a probabilistic predictive rule. We specifically develop the proposed methodology for a recursive procedure for online learning in nonparametric mixture models, often refereed to as Newton's algorithm. This algorithm is simple and fast; however, its approximation properties are unclear. By reinterpreting it as a predictive rule, we can show that it underlies a statistical model which is, asymptotically, a Bayesian, exchangeable mixture model. In this sense, the recursive rule provides a quasi-Bayes solution. While the algorithm only offers a point estimate, our clean statistical formulation allows us to provide the asymptotic posterior distribution and asymptotic credible intervals for the mixing distribution. Moreover, it gives insights for tuning the parameters, as we illustrate in simulation studies, and paves the way to extensions in various directions. Beyond mixture models, our approach can be applied to other predictive algorithms.

Paper Structure

This paper contains 20 sections, 14 theorems, 103 equations, 8 figures.

Key Result

Theorem 3.1

Let the process $((X_n, \theta_n))$ have a probability law $P$ that satisfies assumptions (eq:newtonAsPred). Then, $P$-a.s.,

Figures (8)

  • Figure 1: Monte Carlo approximation of the prior density of $G(0)$. First panel: Monte Carlo samples $G_N^{(m)}$, $m=1, \ldots, 1000$, $N=10,000$. Second panel: Histogram of the sampled $G^{(m)}(0), m=1, \ldots, 1000$ and corresponding Monte Carlo estimate of the prior density of $G(0)$ (solid curve) versus the Beta$(\alpha G_0(0), \alpha (1-G_0(0))$ density (dotted).
  • Figure 2: Mixing density estimate $g_n$, and estimates obtained over $100$ random permutations of the original sample (plotted in gray). Simulated data from a location mixture of Gaussians; $\sigma^2=1; n=1000$. The true mixing density is the dashed curve. Panel (a): $\alpha_n=1/(\alpha+n)$, with $\alpha=1$. Panel (b): $\alpha_n=1/(\alpha+n)$, $\alpha=100$. Panel (c): $\alpha_n=1/(\alpha+n)^{2/3}$, $\alpha=100$. Panel (d): split-sample weights, $N=500$, $\gamma=3/4$; $\alpha=100$.
  • Figure 3: Mixing density estimate $g_n$ (black) and estimates obtained over $2000$ random permutations of the original sample (plotted in gray). Simulated data from a location mixture of Gaussians: $\sigma^2=1, n=5000$ and multimodal mixing density (dashed curve). Weights $\alpha_n$ as in Figure \ref{['fig:bimodal']}.
  • Figure 4: Mixing density estimate $g_n$ (black) and estimates obtained over $200$ random permutations of the sample (gray). Simulated data from a location mixture of Gaussians: $\sigma^2=0.1, n=5000$. Multimodal mixing density (dashed) and weights $\alpha_n$ as in Figure \ref{['fig:bimodal']}.
  • Figure 5: Mixing density estimate $g_n$ (black) and estimates for $100$ random permutations of the original sample (gray). Simulated data from a location mixture of Gaussians: $\sigma^2=0.1, n=1000$. Panel (a) Mixing density (dashed) $g^*=\text{N}(2,2)$. Panel (b) Mixing density (dashed) $g^*=\text{N}(2,0.2)$. Weights $\alpha_n = 1/(\alpha+n)$, with $\alpha=50$.
  • ...and 3 more figures

Theorems & Definitions (18)

  • Definition 2.1
  • Theorem 3.1
  • Proposition 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Theorem 3.4
  • Lemma 4.1
  • Theorem 4.1
  • Remark 4.1
  • Remark 4.2
  • ...and 8 more