Table of Contents
Fetching ...

Cobaya: Code for Bayesian Analysis of hierarchical physical models

Jesus Torrado, Antony Lewis

TL;DR

<3-5 sentence high-level summary>

Abstract

We present Cobaya, a general-purpose Bayesian analysis code aimed at models with complex internal interdependencies. Without the need for specific code by the user, interdependencies between different stages of a model pipeline are exploited for sampling efficiency: intermediate results are automatically cached, and parameters are grouped in blocks according to their dependencies and optimally sorted, taking into account their individual computational costs, so as to minimize the cost of their variation during sampling, thanks to a novel algorithm. Cobaya allows exploration of posteriors using a range of Monte Carlo samplers, and also has functions for maximization and importance-reweighting of Monte Carlo samples with new priors and likelihoods. Cobaya is written in Python in a modular way that allows for extendability, use of calculations provided by external packages, and dynamical reparameterization without modifying its source. It can exploit hybrid OpenMP/MPI parallelization, and has sub-millisecond overhead per posterior evaluation. Though Cobaya is a general purpose statistical framework, it includes interfaces to a set of cosmological Boltzmann codes and likelihoods (the latter being agnostic with respect to the choice of the former), and automatic installers for external dependencies.

Cobaya: Code for Bayesian Analysis of hierarchical physical models

TL;DR

<3-5 sentence high-level summary>

Abstract

We present Cobaya, a general-purpose Bayesian analysis code aimed at models with complex internal interdependencies. Without the need for specific code by the user, interdependencies between different stages of a model pipeline are exploited for sampling efficiency: intermediate results are automatically cached, and parameters are grouped in blocks according to their dependencies and optimally sorted, taking into account their individual computational costs, so as to minimize the cost of their variation during sampling, thanks to a novel algorithm. Cobaya allows exploration of posteriors using a range of Monte Carlo samplers, and also has functions for maximization and importance-reweighting of Monte Carlo samples with new priors and likelihoods. Cobaya is written in Python in a modular way that allows for extendability, use of calculations provided by external packages, and dynamical reparameterization without modifying its source. It can exploit hybrid OpenMP/MPI parallelization, and has sub-millisecond overhead per posterior evaluation. Though Cobaya is a general purpose statistical framework, it includes interfaces to a set of cosmological Boltzmann codes and likelihoods (the latter being agnostic with respect to the choice of the former), and automatic installers for external dependencies.

Paper Structure

This paper contains 23 sections, 9 equations, 9 figures.

Figures (9)

  • Figure 1: Simplified structure of Cobaya's source, showing classes (squares) and parameters (ellipses). See section \ref{['sec:structure']} for a description of each class and parameter role. The arrows between TheoryCollection and LikelihoodCollection represent computed quantities and parameters that can be exchanged arbitrarily between theories and likelihoods.
  • Figure 2: Example input in plain text (YAML). It defines a Gaussian-ring likelihood with radius $1$ and standard deviation $0.02$, over the combination of a uniform prior $(x,y)\in(0,2)^2$ (notice the two possible different specifications used for x and y) and a Gaussian prior of standard deviation $0.3$ along the $x=y$ direction (n.b.: simple 1D priors are defined in params, while multidimensional ones are defined in prior). The likelihood, the multidimensional prior and the derived parameters r and theta are given as Python functions (here source strings, but can be assigned Python functions directly when working in a Python file or shell -- for source strings scipy.stats and numpy are pre-imported as stats and np resp.). The results of this MCMC sample will be written in a folder called chains with file name prefix ring, as per the output option. The resulting densities can be seen in Fig. \ref{['fig:results']}.
  • Figure 3: Example similar to the one in Fig. \ref{['fig:input']}, now using Cobaya classes to split the computation in two: the transformation between orthogonal and polar coordinates, and the likelihood in terms of polar coordinates. Here we let the mean radius of the gaussian ring vary over a narrow prior, to illustrate Cobaya's automated blocking: when sampling using MCMC or PolyChord, jumps in the $(x, y)$ directions will be alternated with jumps in mean_radius. After every jump on $(x, y)$, the resulting intermediate product $r$ is cached, so that $r$ does not need to be recomputed when only mean_radius is varied. In this trivial example, the intermediate quantity $r$ exchanged between the theory and the likelihood is just a real number, but it could be any arbitrarily complicated and many-dimensional numerical quantity, as well as a general Python object. Cobaya knows about the interdependency between the likelihood, that needs $r$, and the theory, that computes $r$, via the respective declarations in methods Likelihood.get_requirements and Theory.get_can_provide.
  • Figure 4: Results from sampling using the inputs shown in Fig. \ref{['fig:input']}, analysed by GetDist: The upper figure shows a triangle plot combining 2D posterior contours (enclosing 68% and 95% of the probability) and marginalized 1D posteriors for the sampled parameters $(x,y)$. The lower plot shows the 1D posteriors for the derived parameters $(r,\theta)$. All the posteriors are shown normalized to the same maximum.
  • Figure 5: 1- and 2-d marginalized posteriors for the feature parameters in the delensed scenario ($\Lambda$CDM cosmological parameters were sampled but are not shown here). There exists a degeneracy towards simultaneous large amplitude, wide envelope and large-$k$ center reaching the prior boundaries (see discussion in main text).
  • ...and 4 more figures