Table of Contents
Fetching ...

A note on the area under the likelihood and the fake evidence for model selection

L. Martino, F. Llorente

Abstract

Improper priors are not allowed for the computation of the Bayesian evidence $Z=p({\bf y})$ (a.k.a., marginal likelihood), since in this case $Z$ is not completely specified due to an arbitrary constant involved in the computation. However, in this work, we remark that they can be employed in a specific type of model selection problem: when we have several (possibly infinite) models belonging to the same parametric family (i.e., for tuning parameters of a parametric model). However, the quantities involved in this type of selection cannot be considered as Bayesian evidences: we suggest to use the name ``fake evidences'' (or ``areas under the likelihood'' in the case of uniform improper priors). We also show that, in this model selection scenario, using a diffuse prior and increasing its scale parameter asymptotically to infinity, we cannot recover the value of the area under the likelihood, obtained with a uniform improper prior. We first discuss it from a general point of view. Then we provide, as an applicative example, all the details for Bayesian regression models with nonlinear bases, considering two cases: the use of a uniform improper prior and the use of a Gaussian prior, respectively. A numerical experiment is also provided confirming and checking all the previous statements.

A note on the area under the likelihood and the fake evidence for model selection

Abstract

Improper priors are not allowed for the computation of the Bayesian evidence (a.k.a., marginal likelihood), since in this case is not completely specified due to an arbitrary constant involved in the computation. However, in this work, we remark that they can be employed in a specific type of model selection problem: when we have several (possibly infinite) models belonging to the same parametric family (i.e., for tuning parameters of a parametric model). However, the quantities involved in this type of selection cannot be considered as Bayesian evidences: we suggest to use the name ``fake evidences'' (or ``areas under the likelihood'' in the case of uniform improper priors). We also show that, in this model selection scenario, using a diffuse prior and increasing its scale parameter asymptotically to infinity, we cannot recover the value of the area under the likelihood, obtained with a uniform improper prior. We first discuss it from a general point of view. Then we provide, as an applicative example, all the details for Bayesian regression models with nonlinear bases, considering two cases: the use of a uniform improper prior and the use of a Gaussian prior, respectively. A numerical experiment is also provided confirming and checking all the previous statements.
Paper Structure (32 sections, 88 equations, 3 figures, 2 tables)

This paper contains 32 sections, 88 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: (a) The area under the likelihood $S=S({\bf y}|\sigma_e=1)$ and $Z$ in log-domain as function of $\sigma_p$. (b) Part 1 in $-\log Z$ converges to a constant, whereas part 2 in $-\log Z$ diverges. (c) The difference $\log Z_1 - \log Z_2$ converges to a constant (horizontal asymptote); $Z_1=p({\bf y}|\sigma_p,\sigma_e=1,\mu_p=2)$ corresponds to $\sigma_e=1$ and $Z_2=p({\bf y}|\sigma_p,\sigma_e=4,\mu_p=2)$ corresponds to $\sigma_e=4$.
  • Figure 2: (a) One realization of the data vector ${\bf y}$ and a corresponding fitted curve according to the observation model. (b) Example of the log area under the likelihood $\log S({\bf y}|{\bm \alpha})$ in one realization of the data vector ${\bf y}$. We can see that the maxima are localized around approximately $[-4,5]$ and $[5,-4]$ (just $[-4,5]$ is admissible since $\alpha_1<\alpha_2$).
  • Figure 3: Histograms of each component of estimated vector ${\bm \alpha}=[\alpha_1,\alpha_2]^{\top}$ (maximizing $S({\bf y}|{\bm \alpha})$) over $2000$ independent realizations. We can observe that bias is virtually zero and the variance in bigger for $\alpha_1$ with respect to $\alpha_2$. This is reasonable looking the realization of data in Figure \ref{['dataReFig']} where the (negative) pick at $x=5$ seems much more clear/evident, than the first pick at $x=-4$.