Table of Contents
Fetching ...

Costless correction of chain based nested sampling parameter estimation in gravitational wave data and beyond

Metha Prathaban, Will Handley

TL;DR

Nested sampling parameter estimation suffers an additional stochastic uncertainty from parameter variation along iso-likelihood contours, not captured by standard evidence-based error bars. The authors introduce two phantom-point–based approaches—likelihood binning and reconstructed phantom runs—to quantify and validate this uncertainty using extra likelihood evaluations produced during runtime, demonstrated on simulated gravitational-wave BBH signals. They show that these methods yield error bars and p–p plot coverages comparable to or improving upon the Higson bootstrapping method, though some parameters (e.g., luminosity distance) may require longer chains for convergence. The techniques provide single-run verification of error bars, are broadly applicable to any chain-based nested sampler, and have important implications for credible intervals and coverage in gravitational-wave analyses and beyond.

Abstract

Nested sampling parameter estimation differs from evidence estimation, in that it incurs an additional source of uncertainty. This uncertainty affects estimates of parameter means and credible intervals in gravitational wave analyses and beyond, and yet, it is typically not accounted for in standard uncertainty estimation methods. In this paper, we present two novel methods to quantify this uncertainty more accurately for any chain based nested sampler, using the additional likelihood calls made at runtime in producing independent samples. Using injected signals of black hole binary coalescences as an example, we first show concretely that the usual uncertainty estimation method is insufficient to capture the true error bar on parameter estimates. We then demonstrate how the extra points in the chains of chain based samplers may be carefully utilised to estimate this uncertainty correctly, and provide a way to check the accuracy of the resulting error bars. Finally, we discuss how this uncertainty affects $p$-$p$ plots and coverage assessments.

Costless correction of chain based nested sampling parameter estimation in gravitational wave data and beyond

TL;DR

Nested sampling parameter estimation suffers an additional stochastic uncertainty from parameter variation along iso-likelihood contours, not captured by standard evidence-based error bars. The authors introduce two phantom-point–based approaches—likelihood binning and reconstructed phantom runs—to quantify and validate this uncertainty using extra likelihood evaluations produced during runtime, demonstrated on simulated gravitational-wave BBH signals. They show that these methods yield error bars and p–p plot coverages comparable to or improving upon the Higson bootstrapping method, though some parameters (e.g., luminosity distance) may require longer chains for convergence. The techniques provide single-run verification of error bars, are broadly applicable to any chain-based nested sampler, and have important implications for credible intervals and coverage in gravitational-wave analyses and beyond.

Abstract

Nested sampling parameter estimation differs from evidence estimation, in that it incurs an additional source of uncertainty. This uncertainty affects estimates of parameter means and credible intervals in gravitational wave analyses and beyond, and yet, it is typically not accounted for in standard uncertainty estimation methods. In this paper, we present two novel methods to quantify this uncertainty more accurately for any chain based nested sampler, using the additional likelihood calls made at runtime in producing independent samples. Using injected signals of black hole binary coalescences as an example, we first show concretely that the usual uncertainty estimation method is insufficient to capture the true error bar on parameter estimates. We then demonstrate how the extra points in the chains of chain based samplers may be carefully utilised to estimate this uncertainty correctly, and provide a way to check the accuracy of the resulting error bars. Finally, we discuss how this uncertainty affects - plots and coverage assessments.
Paper Structure (21 sections, 8 equations, 20 figures)

This paper contains 21 sections, 8 equations, 20 figures.

Figures (20)

  • Figure 1: Schematic of a typical nested sampling run with a chain based sampler. At the end of the run, we are left with a set of dead points (black), which define a series of nested iso-likelihood contours. To generate a new live point from a given point, chain based samplers use a Markov-Chain based procedure to continually generate points within its likelihood contour, until the new point is deemed uncorrelated enough with the original point from which it was seeded. Thus, at the end of the run we are also left a set of 'phantom points' (red), the exact number of which depends on the chain lengths. An example chain is shown in black between dead points $2$ and $3$. In parameter estimation, we typically only use the dead points and, therefore, must use the parameter values of each dead point as a proxy for the average parameter value along the entire contour. For an example two-parameter case, where the parameter being estimated is the sum $f(\theta)=\theta_1 + \theta_2$, the contours of constant of constant $f(\theta)$ are shown (dashed). In this case, the parameter value of the dead points is not necessarily representative of the average parameter value over the contours, and this will be the dominant source of uncertainty in our $f(\theta)$ estimate. However, the phantom points can provide a better understanding of the variation of this parameter along the contours, enabling a more accurate quantification of this uncertainty.
  • Figure 2: Posteriors recovered from the injected signal using PolyChord, with the injected parameter values indicated. Since parameter estimation studies require large numbers of runs, we only sample these $4$ parameters, with the other parameters simply set to their injected values.
  • Figure 3: Typically, a distribution of evidence estimates is computed from a single run using the 'simulated weights' method (top left). The mean and standard deviation of this distribution are then quoted as the evidence value and its corresponding uncertainty (black). If the error bars from single runs accurately quantify the uncertainty on the evidence due to unknown prior volumes, the overall mean evidence, computed over many runs, should lie in the $1^{\textrm{st}}$ percentile of estimates $1\%$ of the time, in the $50^{\textrm{th}}$ percentile $50\%$ of the time, and so on. Hence, plotting a histogram of the percentiles from each run in which the overall mean (blue dashed line) lies should give a uniform distribution from 0 to 100 (bottom left). We see here that this is indeed the case for the evidences, with the Kolmogorov-Smirnov $p$-value being 0.25, demonstrating that resampling the prior volumes is sufficient for estimating the error bar on the evidence for this example. For the chirp mass, even by eye the estimates per run are not wide enough to capture the variation in estimates between runs (top and middle right). This indicates that the variation in chirp mass along a given contour is not being properly accounted for. The percentiles plot (bottom right) confirms that the uncertainty estimate on a single nested sampling run is too optimistic. More often than would be the case for correctly distributed errors, the overall mean estimate for the chirp mass over 100 runs (blue dashed line) lies deep in the tails of the estimated chirp mass from a single run. The simulated weights method is not sufficient for calculating the parameter estimation uncertainty.
  • Figure 4: For each of the first 10 nested sampling runs performed on our simulated dataset, the new distribution of chirp mass estimates are plotted, obtained from resampling both shrinkage ratios and chirp mass values from likelihood bins around each contour (top). Resampling both the weights and the chirp mass gives wider error bars, as expected, and even by eye these seem more consistent with the spread of estimates across runs. Testing the likelihood binning method more rigorously, we see that the percentiles of the overall mean chirp mass are now consistent with being uniformly distributed (bottom, purple). This indicates that the true variation of the chirp mass along the contours is now being correctly accounted for, and thus the stochasticity of nested sampling parameter estimation is properly captured.
  • Figure 5: As with the likelihood binning method, computing parameter estimates from the reconstructed phantom runs gives correctly distributed percentiles. Again, we capture the stochasticity of nested sampling parameter estimation without additional computational cost.
  • ...and 15 more figures