Empirical Bayes in Bayesian learning: understanding a common practice

Stefano Rizzelli; Judith Rousseau; Sonia Petrone

Empirical Bayes in Bayesian learning: understanding a common practice

Stefano Rizzelli, Judith Rousseau, Sonia Petrone

TL;DR

This work provides formal contents to common beliefs on this popular practice of Bayesian posterior distribution approximation, and covers both identifiable models, illustrating applications to sparse regression, and non-identifiable models - specifically, overfitted mixture models.

Abstract

In applications of Bayesian procedures, even when the prior law is carefully specified, it may be delicate to elicit the prior hyperparameters so that it is often tempting to fix them from the data, usually by their maximum likelihood estimates (MMLE), obtaining a so-called empirical Bayes posterior distribution. Although questionable, this is a common practice; but theoretical properties seem mostly only available on a case-by-case basis. In this paper we provide general properties for parametric models. First, we study the limit behavior of the MMLE and prove results in quite general settings, while also conceptualizing the frequentist context as an unexplored case of maximum likelihood estimation under model misspecification. We cover both identifiable models, illustrating applications to sparse regression, and non-identifiable models - specifically, overfitted mixture models. Finally, we prove higher order merging results. In regular cases, the empirical Bayes posterior is shown to be a fast approximation to the Bayesian posterior distribution of the researcher who, within the given class of priors, has the most information about the true model's parameters. This is a faster approximation than classic Bernstein-von Mises results. Given the class of priors, our work provides formal contents to common beliefs on this popular practice.

Empirical Bayes in Bayesian learning: understanding a common practice

TL;DR

Abstract

Paper Structure (39 sections, 11 theorems, 246 equations, 2 figures, 1 table)

This paper contains 39 sections, 11 theorems, 246 equations, 2 figures, 1 table.

Introduction
Our contributions and structure of the paper
Asymptotic behavior of the MMLE
General results for identifiable models
Main results
Finite overfitted mixtures
EBIB approximations to Bayesian inference
Higher order approximations for regular EBIB procedures
Higher order analysis of posterior merging
Higher order analysis of predictive merging
Examples and extensions
Final remarks
Acknowledgements.
Supplementary material.
Notation
...and 24 more sections

Key Result

Theorem 2.1

Assume Conditions cond:mod_prior--cond:prior hold true. Let $\Lambda^*_{KL}$ be the set of accumulation points of the sequences of asymptotic minimizers $\lambda_n^*$ of $KL(p_{\theta_0}^{(n)}\Vert m_\lambda^{(n)})$ over $\Lambda$, i.e. the sequences satisfying Then, $\Lambda^*_{KL} \subset \Lambda^*$.

Figures (2)

Figure 1: Bayes (solid) and EB posterior densities in Example \ref{['ex:simple']}.
Figure 2: Bayesian LASSO. Posterior densities of $\beta_1$ and $\beta_{14}$: EB with MMLE (black solid), Bayes with oracle hyperparameter $\lambda^*$ (gray solid), Bayes with $\lambda=1$ (dotted) and $\lambda=8$ (dashed). True values $\beta_{0,j}$ are marked as black bullets and EB posterior means as empty triangles.

Theorems & Definitions (24)

Theorem 2.1
Corollary 2.2
Theorem 2.3
Corollary 2.4
Example 2.1
Example 2.2: Discrete Markov Chain
Theorem 2.5
Example 3.1
Proposition 3.1
Remark 3.4
...and 14 more

Empirical Bayes in Bayesian learning: understanding a common practice

TL;DR

Abstract

Empirical Bayes in Bayesian learning: understanding a common practice

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (24)