Table of Contents
Fetching ...

A Bayesian quantification of consistency in correlated datasets

Fabian Köhlinger, Benjamin Joachimi, Marika Asgari, Massimo Viola, Shahab Joudaki, Tilman Tröster

TL;DR

This paper develops a three-tier Bayesian framework to quantify consistency in correlated datasets, addressing self-consistency and cross-dataset tension in cosmology. It combines (i) Bayes factors for global consistency, (ii) posterior-difference tests from duplicated parameter sets, and (iii) predictive checks in the data domain via translated posterior distributions to diagnose sources of tension. Applied to KiDS-450 cosmic shear, the approach finds no significant internal tension (<3σ) across multiple data splits and shows that much of previously claimed tension can be mitigated by accounting for correlations and updated covariance modelling. The work emphasizes that different tension metrics probe different aspects of the data-model relationship and provides a general, end-to-end methodology for assessing consistency in future large-scale structure surveys.

Abstract

We present three tiers of Bayesian consistency tests for the general case of $correlated$ datasets. Building on duplicates of the model parameters assigned to each dataset, these tests range from Bayesian evidence ratios as a global summary statistic, to posterior distributions of model parameter differences, to consistency tests in the data domain derived from posterior predictive distributions. For each test we motivate meaningful threshold criteria for the internal consistency of datasets. Without loss of generality we focus on mutually exclusive, correlated subsets of the same dataset in this work. As an application, we revisit the consistency analysis of the two-point weak lensing shear correlation functions measured from KiDS-450 data. We split this dataset according to large vs. small angular scales, tomographic redshift bin combinations, and estimator type. We do not find any evidence for significant internal tension in the KiDS-450 data, with significances below $3\, σ$ in all cases. Software and data used in this analysis can be found at http://kids.strw.leidenuniv.nl/sciencedata.php

A Bayesian quantification of consistency in correlated datasets

TL;DR

This paper develops a three-tier Bayesian framework to quantify consistency in correlated datasets, addressing self-consistency and cross-dataset tension in cosmology. It combines (i) Bayes factors for global consistency, (ii) posterior-difference tests from duplicated parameter sets, and (iii) predictive checks in the data domain via translated posterior distributions to diagnose sources of tension. Applied to KiDS-450 cosmic shear, the approach finds no significant internal tension (<3σ) across multiple data splits and shows that much of previously claimed tension can be mitigated by accounting for correlations and updated covariance modelling. The work emphasizes that different tension metrics probe different aspects of the data-model relationship and provides a general, end-to-end methodology for assessing consistency in future large-scale structure surveys.

Abstract

We present three tiers of Bayesian consistency tests for the general case of datasets. Building on duplicates of the model parameters assigned to each dataset, these tests range from Bayesian evidence ratios as a global summary statistic, to posterior distributions of model parameter differences, to consistency tests in the data domain derived from posterior predictive distributions. For each test we motivate meaningful threshold criteria for the internal consistency of datasets. Without loss of generality we focus on mutually exclusive, correlated subsets of the same dataset in this work. As an application, we revisit the consistency analysis of the two-point weak lensing shear correlation functions measured from KiDS-450 data. We split this dataset according to large vs. small angular scales, tomographic redshift bin combinations, and estimator type. We do not find any evidence for significant internal tension in the KiDS-450 data, with significances below in all cases. Software and data used in this analysis can be found at http://kids.strw.leidenuniv.nl/sciencedata.php

Paper Structure

This paper contains 20 sections, 68 equations, 19 figures, 4 tables.

Figures (19)

  • Figure 1: Sketch illustrating the definition of significance criteria for tension between data and model predictions. Top: PPD case. The $c_m$ region is defined as the support of the PPD where its density is higher than the density at the position of the data. The hatched area, given by $I_{\rm PPD}=1-c_m$, is used to derive the tension significance. Bottom: TPD case. The fraction of the TPD probability mass lying in the support of the $c_m$ region of the data distribution ($I_{\rm TPD}$, hatched area) is calculated. If this fraction drops below $1 - c_m$, the distributions are in tension by $m \, \sigma$.
  • Figure 2: Sketch of a simple toy model consisting of $N$ independent data points (red) drawn from a normal distribution with width $\sigma$. The data can be modelled with a constant line with a free amplitude $p$ (black line). If the data set is split into two subsets (of equal size), we allow each subset to be modelled with shifted amplitudes $p_{\rm a} = q \, \sigma$ and $p_{\rm b} = -q \, \sigma$, respectively.
  • Figure 3: Using the toy model setup depicted in Fig. \ref{['fig:sketch_toy_model']}, i.e. $S=N-S=N/2=5$, $\sigma=0.1$ and hence $\Delta/\sigma \gg 1$ for $\Delta = \{1., 10.\}$, we derive analytically tractable results for the three tiers of consistency tests as functions of the model shift parameter $q$: a) the Bayes factor (equation \ref{['eq:toy_tier1']}). Note that this estimator is the only one strongly depending on the prior width, $\Delta$. We interpret the Bayes factor here in terms of Jeffreys' scale and the statements should be read as 'barely worth mentioning', 'substantial', etc. evidence for $\mathrm{H}_1:$ 'there exist two separate parameter sets that each describe one subset of the data'; b) the relative error of the parameter difference PDF (equation \ref{['eq:toy_tier2']}); c) significances for the TPD-based consistency estimator (derived from equation \ref{['eq:toy_chi_sqr_corr_final']}). To highlight the impact of a proper propagation of all correlations, we compare the fiducial case of including 'all correlations' (i.e. data subsets and parameters; solid blue line) to the naïve case of 'no correlations' (dashed black line) and 'parameter correlations' only (dotted grey line).
  • Figure 4: Tension significance criteria for a Gaussian toy model in one dimension a) and ten dimensions b). The significance $m\, \sigma$ is plotted as a function of the shift of the best-fit model (i.e. the peak of the TPD or PPD) with respect to the data, $\mu$, in units of the standard deviation of the data measurement error, $s$. Black (dark blue) lines correspond to the definition of tension based on the TPD (PPD) with different line styles showing the dependence on the TPD width, $t$ (as given in the legend, in units of $s$). The light blue line follows the definition of Efstathiou2018. The red line is a naive criterion taken as the relative shift of the data vector, divided by $\sqrt{N}$, where $N$ is the dimension of the distribution under consideration. In one dimension the red line therefore marks a one-to-one relation (overlapping the blue solid line), which is closely approximated by the TPD and PPD definitions of significance as $t/s \rightarrow 0$.
  • Figure 5: The correlation matrix of the $\xi_\pm$ correlation function covariance, $\mathbfss{C}$, for all fiducial angular scales, $\theta$, and tomographic bin combinations, $i \times \, j$ (see 'fiducial scales' in Table \ref{['tab:scales']}).
  • ...and 14 more figures