A note on the area under the likelihood and the fake evidence for model selection

L. Martino; F. Llorente

A note on the area under the likelihood and the fake evidence for model selection

L. Martino, F. Llorente

Abstract

Improper priors are not allowed for the computation of the Bayesian evidence $Z=p({\bf y})$ (a.k.a., marginal likelihood), since in this case $Z$ is not completely specified due to an arbitrary constant involved in the computation. However, in this work, we remark that they can be employed in a specific type of model selection problem: when we have several (possibly infinite) models belonging to the same parametric family (i.e., for tuning parameters of a parametric model). However, the quantities involved in this type of selection cannot be considered as Bayesian evidences: we suggest to use the name ``fake evidences'' (or ``areas under the likelihood'' in the case of uniform improper priors). We also show that, in this model selection scenario, using a diffuse prior and increasing its scale parameter asymptotically to infinity, we cannot recover the value of the area under the likelihood, obtained with a uniform improper prior. We first discuss it from a general point of view. Then we provide, as an applicative example, all the details for Bayesian regression models with nonlinear bases, considering two cases: the use of a uniform improper prior and the use of a Gaussian prior, respectively. A numerical experiment is also provided confirming and checking all the previous statements.

A note on the area under the likelihood and the fake evidence for model selection

Abstract

Improper priors are not allowed for the computation of the Bayesian evidence

(a.k.a., marginal likelihood), since in this case

is not completely specified due to an arbitrary constant involved in the computation. However, in this work, we remark that they can be employed in a specific type of model selection problem: when we have several (possibly infinite) models belonging to the same parametric family (i.e., for tuning parameters of a parametric model). However, the quantities involved in this type of selection cannot be considered as Bayesian evidences: we suggest to use the name ``fake evidences'' (or ``areas under the likelihood'' in the case of uniform improper priors). We also show that, in this model selection scenario, using a diffuse prior and increasing its scale parameter asymptotically to infinity, we cannot recover the value of the area under the likelihood, obtained with a uniform improper prior. We first discuss it from a general point of view. Then we provide, as an applicative example, all the details for Bayesian regression models with nonlinear bases, considering two cases: the use of a uniform improper prior and the use of a Gaussian prior, respectively. A numerical experiment is also provided confirming and checking all the previous statements.

Paper Structure (32 sections, 88 equations, 3 figures, 2 tables)

This paper contains 32 sections, 88 equations, 3 figures, 2 tables.

Introduction
Elements in Bayesian inference
Levels in Bayesian inference
Type of model comparison
Use of vague priors and/or improper priors in Level-2
Diffuse/vague priors are informative for model selection
Improper priors: forbidden for computing the evidence $Z$
Uniform improper prior and the area under the likelihood
Key observations
Comparing models which differ for the chosen parameters
On the area under the likelihood $S$
On the empirical Bayes and profile likelihood approaches
Full-Bayesian solution with a double improper (uniform) prior
Example of application to Bayesian regression models
Problem statement
...and 17 more sections

Figures (3)

Figure 1: (a) The area under the likelihood $S=S({\bf y}|\sigma_e=1)$ and $Z$ in log-domain as function of $\sigma_p$. (b) Part 1 in $-\log Z$ converges to a constant, whereas part 2 in $-\log Z$ diverges. (c) The difference $\log Z_1 - \log Z_2$ converges to a constant (horizontal asymptote); $Z_1=p({\bf y}|\sigma_p,\sigma_e=1,\mu_p=2)$ corresponds to $\sigma_e=1$ and $Z_2=p({\bf y}|\sigma_p,\sigma_e=4,\mu_p=2)$ corresponds to $\sigma_e=4$.
Figure 2: (a) One realization of the data vector ${\bf y}$ and a corresponding fitted curve according to the observation model. (b) Example of the log area under the likelihood $\log S({\bf y}|{\bm \alpha})$ in one realization of the data vector ${\bf y}$. We can see that the maxima are localized around approximately $[-4,5]$ and $[5,-4]$ (just $[-4,5]$ is admissible since $\alpha_1<\alpha_2$).
Figure 3: Histograms of each component of estimated vector ${\bm \alpha}=[\alpha_1,\alpha_2]^{\top}$ (maximizing $S({\bf y}|{\bm \alpha})$) over $2000$ independent realizations. We can observe that bias is virtually zero and the variance in bigger for $\alpha_1$ with respect to $\alpha_2$. This is reasonable looking the realization of data in Figure \ref{['dataReFig']} where the (negative) pick at $x=5$ seems much more clear/evident, than the first pick at $x=-4$.

A note on the area under the likelihood and the fake evidence for model selection

Abstract

A note on the area under the likelihood and the fake evidence for model selection

Authors

Abstract

Table of Contents

Figures (3)