Table of Contents
Fetching ...

Applications of Bayesian model selection to cosmological parameters

Roberto Trotta

TL;DR

This paper argues that Bayesian model selection, not traditional p-values, should guide the decision to add cosmological parameters by accounting for information gain and prior volume. It advocates the Savage-Dickey density ratio as a fast, robust method to compute Bayes factors for nested models, and applies this framework to three key cosmological questions using WMAP3 plus external data: a non-scale-invariant spectral index, a flat spatial geometry, and purely adiabatic initial conditions. The results show moderate to strong evidence against the simplest (scale-invariant) index, a strong preference for flatness, and decisive support for adiabatic initial conditions, while highlighting that prior choices can heavily influence the outcomes. Overall, the work demonstrates the value of Bayesian model comparison for cosmology, emphasizes Occam’s razor in interpreting model complexity, and outlines how instrument sensitivity-informed priors and future forecasting (PPOD) can guide future analyses.

Abstract

Bayesian model selection is a tool to decide whether the introduction of a new parameter is warranted by data. I argue that the usual sampling statistic significance tests for a null hypothesis can be misleading, since they do not take into account the information gained through the data, when updating the prior distribution to the posterior. On the contrary, Bayesian model selection offers a quantitative implementation of Occam's razor. I introduce the Savage-Dickey density ratio, a computationally quick method to determine the Bayes factor of two nested models and hence perform model selection. As an illustration, I consider three key parameters for our understanding of the cosmological concordance model. By using WMAP 3-year data complemented by other cosmological measurements, I show that a non-scale invariant spectral index of perturbations is favoured for any sensible choice of prior. It is also found that a flat Universe is favoured with odds of 29:1 over non--flat models, and that there is strong evidence against a CDM isocurvature component to the initial conditions which is totally (anti)correlated with the adiabatic mode (odds of about 2000:1), but that this is strongly dependent on the prior adopted. These results are contrasted with the analysis of WMAP 1-year data, which were not informative enough to allow a conclusion as to the status of the spectral index. In a companion paper, a new technique to forecast the Bayes factor of a future observation is presented.

Applications of Bayesian model selection to cosmological parameters

TL;DR

This paper argues that Bayesian model selection, not traditional p-values, should guide the decision to add cosmological parameters by accounting for information gain and prior volume. It advocates the Savage-Dickey density ratio as a fast, robust method to compute Bayes factors for nested models, and applies this framework to three key cosmological questions using WMAP3 plus external data: a non-scale-invariant spectral index, a flat spatial geometry, and purely adiabatic initial conditions. The results show moderate to strong evidence against the simplest (scale-invariant) index, a strong preference for flatness, and decisive support for adiabatic initial conditions, while highlighting that prior choices can heavily influence the outcomes. Overall, the work demonstrates the value of Bayesian model comparison for cosmology, emphasizes Occam’s razor in interpreting model complexity, and outlines how instrument sensitivity-informed priors and future forecasting (PPOD) can guide future analyses.

Abstract

Bayesian model selection is a tool to decide whether the introduction of a new parameter is warranted by data. I argue that the usual sampling statistic significance tests for a null hypothesis can be misleading, since they do not take into account the information gained through the data, when updating the prior distribution to the posterior. On the contrary, Bayesian model selection offers a quantitative implementation of Occam's razor. I introduce the Savage-Dickey density ratio, a computationally quick method to determine the Bayes factor of two nested models and hence perform model selection. As an illustration, I consider three key parameters for our understanding of the cosmological concordance model. By using WMAP 3-year data complemented by other cosmological measurements, I show that a non-scale invariant spectral index of perturbations is favoured for any sensible choice of prior. It is also found that a flat Universe is favoured with odds of 29:1 over non--flat models, and that there is strong evidence against a CDM isocurvature component to the initial conditions which is totally (anti)correlated with the adiabatic mode (odds of about 2000:1), but that this is strongly dependent on the prior adopted. These results are contrasted with the analysis of WMAP 1-year data, which were not informative enough to allow a conclusion as to the status of the spectral index. In a companion paper, a new technique to forecast the Bayes factor of a future observation is presented.

Paper Structure

This paper contains 12 sections, 26 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The parameter space accessible a priori to WMAP in the $(\zeta,|\mathcal{S}|)$ plane is obtained by requiring better than $10\%$ accuracy on $\vert f_{\text{iso}}\vert$ in the Fisher Matrix error forecast (open circles for the best case, crosses for the worst case, depending on the fiducial values of $\tau_r, n_S$ and on the sign of the correlation). This translates into a prior accessible range $0.4 \,\hbox{$<$}\hbox{$\sim$}\, \vert f_{\text{iso}}\vert \,\hbox{$<$}\hbox{$\sim$}\, 100$ (diagonal, dashed lines), but only if $\zeta, \vert \mathcal{S} \vert \,\hbox{$>$}\hbox{$\sim$}\, 10^{-5}$. Models which roughly satisfy the COBE measurement of the large scale CMB anisotropies ($\delta T / T \approx 10^{-5}$) lie on the blue/solid line and have positive (negative) correlation left (right) of the cusp.
  • Figure 2: Equivalent priors on $\vert f_{\text{iso}}\vert$ corresponding to the flat priors used in Beltran et al. (2005) for the parameters $\alpha$ and $\sqrt{\alpha}$. Both priors cut away the parameter space $\vert f_{\text{iso}}\vert \gg 1$, thus reducing the Occam's razor effect caused by a scale-free parameter. The odds in favor of the purely adiabatic model thus become correspondingly smaller. Model comparison results can depend crucially on the variables adopted.
  • Figure 3: Regions in the $(I, \lambda)$ plane (shaded) where one of the competing models is supported by positive (odds of 3:1), moderate (12:1) or strong (odds larger than 150:1) evidence. The white region corresponds to an inconclusive result (odds of about 1:1), while in the region $I<0$ (dotted) the posterior is dominated by the prior and the measurement is non--informative. In the lower horizontal axis, $I$ is given in base 10, i.e. $I = - \log_{10}\beta$, while it is given in bits in the upper horizontal axis. The contours are computed from the SDDR formula assuming a Gaussian likelihood and a Gaussian prior. The location of the three parameters analyzed in the text is shown by diamonds (circles) for WMAP1+ext data (WMAP3+ext data). Choosing a wider (narrower) prior range would shift the points horizontally to the right (to the left) of the plot.
  • Figure 4: Illustration of Lindley's paradox. Sampling statistics hypothesis testing rejects the hypothesis that $\omega = \omega_\star$ with 95% confidence in all 3 cases (coloured curves) illustrated in the top panel ($\lambda = 1.96$ in all cases). Bayesian model selection does take into account the information content of the data $I$, and correctly favors the simpler model (predicting that $\omega = \omega_\star$) for informative data (right vertical line in the bottom panel, $I = 2$ expressed in base--10 logarithm), with odds of $14:1$ (for a Gaussain prior, dotted black line). Using a flat prior of the same width (solid black line) instead reduces $\ln B_{01}$ by a geometric factor $\ln(2/\pi)/2 = 0.22$ in the informative ($I \gg 1$) regime. Notice that for non--informative data ($I \ll 0$) the Bayes factor reverts to equal odds for the two models.
  • Figure 5: Benchmark test for the SDDR formula for a Gaussian likelihood and prior, for parameter spaces of dimensionality $D$. The horizontal, dotted lines give the exact value. The SDDR performs extremely well for comparing models lying $\lambda < 3$ sigma's away from each other. In this case, less than $10^5$ samples are required to achieve a satisfactory agreement with the exact result. For $\lambda \,\hbox{$>$}\hbox{$\sim$}\, 4$ the tails of the likelihood are not sufficiently explored to apply the SDDR. The missing points for $\lambda = 3$ indicate that the given number of samples are insufficient to achieve coverage of the simpler model prediction.