Table of Contents
Fetching ...

Understanding parameter differences between analyses employing nested data subsets

Steven Gratton, Anthony Challinor

TL;DR

This work develops an analytic framework to understand parameter shifts between analyses using a full dataset and a nested subset, attributing differences to scatter and intrinsic variance. By expanding the likelihood action to second order and exploiting Fisher-information-like averages, it derives a simple expression for the covariance of parameter differences: the difference between the partial and full analyses equals $ar{S_1''}^{-1}-ar{S''}^{-1}$. This allows a Gaussian, multivariate test for shifts in multiple parameters and is validated with a Planck-like CMB example, showing observed shifts largely conform to the predicted distribution. The paper further discusses extensions to multiple nested subsets, non-nested data, and Wilks' theorem, providing a broadly applicable tool for assessing data coherence and model adequacy in Bayesian inference contexts.

Abstract

We provide an analytical argument for understanding the likely nature of parameter shifts between those coming from an analysis of a dataset and from a subset of that dataset, assuming differences are down to noise and any intrinsic variance alone. This gives us a measure against which we can interpret changes seen in parameters and make judgements about the coherency of the data and the suitability of a model in describing those data.

Understanding parameter differences between analyses employing nested data subsets

TL;DR

This work develops an analytic framework to understand parameter shifts between analyses using a full dataset and a nested subset, attributing differences to scatter and intrinsic variance. By expanding the likelihood action to second order and exploiting Fisher-information-like averages, it derives a simple expression for the covariance of parameter differences: the difference between the partial and full analyses equals . This allows a Gaussian, multivariate test for shifts in multiple parameters and is validated with a Planck-like CMB example, showing observed shifts largely conform to the predicted distribution. The paper further discusses extensions to multiple nested subsets, non-nested data, and Wilks' theorem, providing a broadly applicable tool for assessing data coherence and model adequacy in Bayesian inference contexts.

Abstract

We provide an analytical argument for understanding the likely nature of parameter shifts between those coming from an analysis of a dataset and from a subset of that dataset, assuming differences are down to noise and any intrinsic variance alone. This gives us a measure against which we can interpret changes seen in parameters and make judgements about the coherency of the data and the suitability of a model in describing those data.

Paper Structure

This paper contains 7 sections, 28 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Two-dimensional marginalised posterior distributions for a full (blue; smaller contours) and a partial (red; larger contours) analysis of a simulated CMB dataset.
  • Figure 2: Representative plots showing shifts in pairs of parameters between the partial and full analyses of 100 simulated CMB datasets (large blue filled circles), compared to those expected from Eq. (\ref{['eq:covresult']}) (illustrated via 500 Gaussian realizations displayed with small red open circles).
  • Figure 3: Normalized histogram showing the effective $\chi^2$ from Eq. (\ref{['eq:deltadist']}) evaluated for the difference between the partial and full analyses of 100 simulated CMB datasets, using analytic covariances computed around the fiducial model (including terms accounting for tolerances in the minimization procedure), compared to a $\chi^2$ distribution for six degrees of freedom.