Table of Contents
Fetching ...

Empirical Bayes in Bayesian learning: understanding a common practice

Stefano Rizzelli, Judith Rousseau, Sonia Petrone

TL;DR

This work provides formal contents to common beliefs on this popular practice of Bayesian posterior distribution approximation, and covers both identifiable models, illustrating applications to sparse regression, and non-identifiable models - specifically, overfitted mixture models.

Abstract

In applications of Bayesian procedures, even when the prior law is carefully specified, it may be delicate to elicit the prior hyperparameters so that it is often tempting to fix them from the data, usually by their maximum likelihood estimates (MMLE), obtaining a so-called empirical Bayes posterior distribution. Although questionable, this is a common practice; but theoretical properties seem mostly only available on a case-by-case basis. In this paper we provide general properties for parametric models. First, we study the limit behavior of the MMLE and prove results in quite general settings, while also conceptualizing the frequentist context as an unexplored case of maximum likelihood estimation under model misspecification. We cover both identifiable models, illustrating applications to sparse regression, and non-identifiable models - specifically, overfitted mixture models. Finally, we prove higher order merging results. In regular cases, the empirical Bayes posterior is shown to be a fast approximation to the Bayesian posterior distribution of the researcher who, within the given class of priors, has the most information about the true model's parameters. This is a faster approximation than classic Bernstein-von Mises results. Given the class of priors, our work provides formal contents to common beliefs on this popular practice.

Empirical Bayes in Bayesian learning: understanding a common practice

TL;DR

This work provides formal contents to common beliefs on this popular practice of Bayesian posterior distribution approximation, and covers both identifiable models, illustrating applications to sparse regression, and non-identifiable models - specifically, overfitted mixture models.

Abstract

In applications of Bayesian procedures, even when the prior law is carefully specified, it may be delicate to elicit the prior hyperparameters so that it is often tempting to fix them from the data, usually by their maximum likelihood estimates (MMLE), obtaining a so-called empirical Bayes posterior distribution. Although questionable, this is a common practice; but theoretical properties seem mostly only available on a case-by-case basis. In this paper we provide general properties for parametric models. First, we study the limit behavior of the MMLE and prove results in quite general settings, while also conceptualizing the frequentist context as an unexplored case of maximum likelihood estimation under model misspecification. We cover both identifiable models, illustrating applications to sparse regression, and non-identifiable models - specifically, overfitted mixture models. Finally, we prove higher order merging results. In regular cases, the empirical Bayes posterior is shown to be a fast approximation to the Bayesian posterior distribution of the researcher who, within the given class of priors, has the most information about the true model's parameters. This is a faster approximation than classic Bernstein-von Mises results. Given the class of priors, our work provides formal contents to common beliefs on this popular practice.
Paper Structure (39 sections, 11 theorems, 246 equations, 2 figures, 1 table)

This paper contains 39 sections, 11 theorems, 246 equations, 2 figures, 1 table.

Key Result

Theorem 2.1

Assume Conditions cond:mod_prior--cond:prior hold true. Let $\Lambda^*_{KL}$ be the set of accumulation points of the sequences of asymptotic minimizers $\lambda_n^*$ of $KL(p_{\theta_0}^{(n)}\Vert m_\lambda^{(n)})$ over $\Lambda$, i.e. the sequences satisfying Then, $\Lambda^*_{KL} \subset \Lambda^*$.

Figures (2)

  • Figure 1: Bayes (solid) and EB posterior densities in Example \ref{['ex:simple']}.
  • Figure 2: Bayesian LASSO. Posterior densities of $\beta_1$ and $\beta_{14}$: EB with MMLE (black solid), Bayes with oracle hyperparameter $\lambda^*$ (gray solid), Bayes with $\lambda=1$ (dotted) and $\lambda=8$ (dashed). True values $\beta_{0,j}$ are marked as black bullets and EB posterior means as empty triangles.

Theorems & Definitions (24)

  • Theorem 2.1
  • Corollary 2.2
  • Theorem 2.3
  • Corollary 2.4
  • Example 2.1
  • Example 2.2: Discrete Markov Chain
  • Theorem 2.5
  • Example 3.1
  • Proposition 3.1
  • Remark 3.4
  • ...and 14 more