Table of Contents
Fetching ...

Statistical methods in cosmology

Licia Verde

TL;DR

The work surveys essential statistical tools for cosmology, addressing how to extract information from high-dimensional data and forecast learning from future surveys. It foregrounds Bayesian inference, likelihoods, transformations, marginalization, and model comparison, and it demonstrates these methods with concrete cosmological contexts such as the CMB and BAO. It also covers practical forecasting with the Fisher matrix, data-combination pitfalls, and Monte Carlo techniques, providing a practical starter-kit for robust parameter estimation and model testing in cosmology. The results equip researchers to quantify uncertainties on parameters like $\\Omega_m$, $H_0$, $w$, $n_s$, and $\sigma_8$, while guiding experimental design and ensuring principled interpretation of complex datasets.

Abstract

The advent of large data-set in cosmology has meant that in the past 10 or 20 years our knowledge and understanding of the Universe has changed not only quantitatively but also, and most importantly, qualitatively. Cosmologists rely on data where a host of useful information is enclosed, but is encoded in a non-trivial way. The challenges in extracting this information must be overcome to make the most of a large experimental effort. Even after having converged to a standard cosmological model (the LCDM model) we should keep in mind that this model is described by 10 or more physical parameters and if we want to study deviations from it, the number of parameters is even larger. Dealing with such a high dimensional parameter space and finding parameters constraints is a challenge on itself. Cosmologists want to be able to compare and combine different data sets both for testing for possible disagreements (which could indicate new physics) and for improving parameter determinations. Finally, cosmologists in many cases want to find out, before actually doing the experiment, how much one would be able to learn from it. For all these reasons, sophisiticated statistical techniques are being employed in cosmology, and it has become crucial to know some statistical background to understand recent literature in the field. I will introduce some statistical tools that any cosmologist should know about in order to be able to understand recently published results from the analysis of cosmological data sets. I will not present a complete and rigorous introduction to statistics as there are several good books which are reported in the references. The reader should refer to those.

Statistical methods in cosmology

TL;DR

The work surveys essential statistical tools for cosmology, addressing how to extract information from high-dimensional data and forecast learning from future surveys. It foregrounds Bayesian inference, likelihoods, transformations, marginalization, and model comparison, and it demonstrates these methods with concrete cosmological contexts such as the CMB and BAO. It also covers practical forecasting with the Fisher matrix, data-combination pitfalls, and Monte Carlo techniques, providing a practical starter-kit for robust parameter estimation and model testing in cosmology. The results equip researchers to quantify uncertainties on parameters like , , , , and , while guiding experimental design and ensuring principled interpretation of complex datasets.

Abstract

The advent of large data-set in cosmology has meant that in the past 10 or 20 years our knowledge and understanding of the Universe has changed not only quantitatively but also, and most importantly, qualitatively. Cosmologists rely on data where a host of useful information is enclosed, but is encoded in a non-trivial way. The challenges in extracting this information must be overcome to make the most of a large experimental effort. Even after having converged to a standard cosmological model (the LCDM model) we should keep in mind that this model is described by 10 or more physical parameters and if we want to study deviations from it, the number of parameters is even larger. Dealing with such a high dimensional parameter space and finding parameters constraints is a challenge on itself. Cosmologists want to be able to compare and combine different data sets both for testing for possible disagreements (which could indicate new physics) and for improving parameter determinations. Finally, cosmologists in many cases want to find out, before actually doing the experiment, how much one would be able to learn from it. For all these reasons, sophisiticated statistical techniques are being employed in cosmology, and it has become crucial to know some statistical background to understand recent literature in the field. I will introduce some statistical tools that any cosmologist should know about in order to be able to understand recently published results from the analysis of cosmological data sets. I will not present a complete and rigorous introduction to statistics as there are several good books which are reported in the references. The reader should refer to those.

Paper Structure

This paper contains 22 sections, 41 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Left: distance modulus vs redshift for Supernovae type 1A from the UNion sample Unionsn. Right: bandpower $P(k)$ for DR5 SDSS galaxies, from percivalDR5. In both cases one may fit a theory (and the theory parameters) to the data with the Chisquare method. Note that in both cases errors are correlated. In the right panel the errors are also strictly speaking not Gaussianly distributed.
  • Figure 2: Left: example of a one-dimensional chisquare for a Gaussian distribution as a function of a parameter and corresponding 68.3%, 95.4% and 99.5% confidence levels. Right a two dimensional example for the Union supernovae data. Figure from Kowlaski et al. (2009)Unionsn reproduced by permission of the AAS. Note that in a practical application even if the data have gaussian errors the errors on the parameter may not be well described by multi-variate Gaussians (thus the confidence regions are not ellipses).
  • Figure 3: Marginalization effects. Top panel: We consider the posterior distribution for the cosmological parameters of a dark energy + cold dark matter model where curvature is a free parameter and so is a (constant) equation of state parameter for dark energy. The data are the WMAP 5 year data. The red line shows the N-dimensional maximum posterior value and the black line is the marginalized posterior over all other cosmological parameters. Figure courtesy of LAMBDA Lambda. Bottom panel: figure from haman07. Illustration of Central Credible Interval (CCI) and Minimum Credible Interval (MCI), for the case of a LCDM model with free number of effective neutrino species (ignore blue dotted line for this example, red line is the marginalized posterior).
  • Figure 4: Top: WMAP 1st year data constraints in the $\Omega_m$, $\Omega_{\Lambda}$ plane, from Spergel et al 2003, ApJS, 148:175-194 wmap1params. Bottom: models consistent with the WMAP 3 yr data, from Spergel et al. (2007) ApJS, 170, 377wmap3params. In both cases the model is a non-flat LCDM model. Figures reproduced by permission of the AAS.
  • Figure 5: Left: constraints in the $\Omega_m$$\sigma_8$ plane for a flat LCDM model for WMAP 3yr data (blue), weak lensing constraints (orange) and combined constraints.Figure from Spergel et al. 2003 wmap3params, reproduced by permission of the AAS. Right:Constraints in the $\Omega_k,w$ plane for non-flat dark energy models with constant $w$ for WMAP5+ supernovae data (in black) and WMAP5+BAO (in red). Figure courtesy of LAMBDA LAMBDA.
  • ...and 4 more figures