Table of Contents
Fetching ...

Calibrating Bayesian Tension Statistics using Neural Ratio Estimation

Harry T. J. Bevins, William J. Handley, Thomas Gessey-Jones

Abstract

When fits of the same physical model to two different datasets disagree, we call this tension. Several apparent tensions in cosmology have occupied researchers in recent years, and a number of different metrics have been proposed to quantify tension. Many of these metrics suffer from limiting assumptions, and correctly calibrating these is essential if we want to successfully determine whether discrepancies are significant. A commonly used metric of tension is the evidence ratio R. The statistic has been widely adopted by the community as a Bayesian way of quantifying tensions, however, it has a non-trivial dependence on the prior that is not always accounted for properly. We show that this can be calibrated out effectively with Neural Ratio Estimation. We demonstrate our proposed calibration technique with an analytic example, a toy example inspired by 21-cm cosmology, and with observations of the Baryon Acoustic Oscillations from the Dark Energy Spectroscopic Instrument (DESI) and the Sloan Digital Sky Survey (SDSS). We find no significant tension between DESI and SDSS.

Calibrating Bayesian Tension Statistics using Neural Ratio Estimation

Abstract

When fits of the same physical model to two different datasets disagree, we call this tension. Several apparent tensions in cosmology have occupied researchers in recent years, and a number of different metrics have been proposed to quantify tension. Many of these metrics suffer from limiting assumptions, and correctly calibrating these is essential if we want to successfully determine whether discrepancies are significant. A commonly used metric of tension is the evidence ratio R. The statistic has been widely adopted by the community as a Bayesian way of quantifying tensions, however, it has a non-trivial dependence on the prior that is not always accounted for properly. We show that this can be calibrated out effectively with Neural Ratio Estimation. We demonstrate our proposed calibration technique with an analytic example, a toy example inspired by 21-cm cosmology, and with observations of the Baryon Acoustic Oscillations from the Dark Energy Spectroscopic Instrument (DESI) and the Sloan Digital Sky Survey (SDSS). We find no significant tension between DESI and SDSS.
Paper Structure (15 sections, 24 equations, 9 figures)

This paper contains 15 sections, 24 equations, 9 figures.

Figures (9)

  • Figure 1: A schematic of the neural ratio estimator (NRE) used in this work, which we refer to as a tensionnet. The NRE is trained on matched and mismatched pairs of simulated observations from two different experiments $A$ and $B$ and outputs an estimate of the tension statistic $R$. The network is trained using the binary cross entropy loss function.
  • Figure 2: Interpreting $R_\mathrm{obs}$ with NREs. The top row of the figure shows an example distribution of possible in concordance $R$ values. As we move to the right of the median of the distribution we move towards concordance and to the left, lower values of $\log R$, towards tension. The middle row of the figure shows the corresponding cumulative distribution function, and the bottom row shows how the tension statistic $T$ and concordance statistic $C$ vary with $\log R$ for this example. The observed $\log R_\mathrm{obs}$, its corresponding value on the CDF and its value on $T$ and $C$ are shown as green dashed lines. The shaded regions show the 1,2 and 3 $\sigma$ contours for both statistics with the darker region representing 1$\sigma$ and the lighter region 3$\sigma$.
  • Figure 3: We hypothesise two experiments observing data that can be described with a linear model and a Gaussian likelihood function. By then defining our prior to also be Gaussian with a diagonal covariance $\Sigma$ we can analytically calculate the joint and individual evidences and the tension statistic $R$. We draw a test set from the joint distribution $\mathcal{Z}(D_A,D_B)$ which we use to analytically derive the in concordance $\log R$ distribution (solid lines, top panel) and predict the distribution from the NRE (dashed lines, top panel) for different prior widths. We also show the sigmoid activation function for reference. The bottom row shows the predicted versus true $\log R$ values for the test set for different prior widths. Performance begins to break down for $\log R > 10$.
  • Figure 4: Using the linear model described in \ref{['sec:validation']} we show how the in concordance $\log R$ distribution can be used to calibrate the prior dependence of the $R$ statistic. We also show how the predicted in concordance $R$ distribution from the tensionnet is largely consistent with the analytic distribution. The narrowest prior is on the top row and the widest on the bottom row. The first column shows the distribution of in concordance $\log R$ values calculated analytically in purple and as predicted by the NRE in orange. We also show the analytically calculated value of $\log R$ for a simulation drawn from the narrow prior as a red dashed line. The middle column shows the CDFs derived from the two in concordance distributions and as horizontal dashed lines the value of the CDF at $R_\mathrm{obs}$ according to the analytic (purple) and NRE (orange) distributions. The final column shows the average values over five runs of $T$ and $C$ derived using the true analytic distributions and the NRE for each prior with an associated error.
  • Figure 5: To further illustrate the application of NREs to the calibration of $R$ we use a toy example inspired by 21-cm cosmology. Left Panel: We simulate an experiment observing a Gaussian absorption trough as a function of frequency (black line) and three different scenarios in which another experiment measures a 21-cm signal with either the same or different amplitudes in a different band (red lines). To each observation, we add Gaussian random noise with a standard deviation of 25 mK (shown in grey and motivated by current observations EDGES). Middle Panel: We train the NRE on simulated observations of the signal by both experiments, covering a wide prior range of signal parameters. We use the NRE to evaluate the possible distribution of in concordance $\log R$ values. We plot the observed $\log R$ for each pair of observations from experiment A and B. Right Panel: Finally, we show the CDF of the in concordance $\log R$ distribution in the right panel of the figure and the corresponding CDF values for each pair of observations. We find that for the two in tension observations the $T=2.989^{+0.167}_{-0.060}$ and $T = 2.147^{+0.056}_{-0.089}$ and for the in concordance observations $C = 0.864^{+0.107}_{-0.076}$ and $T = 0.507^{+0.063}_{-0.078}$.
  • ...and 4 more figures