Table of Contents
Fetching ...

Trust the process: mapping data-driven reconstructions to informed models using stochastic processes

Stefano Rinaldi, Alexandre Toubiana, Jonathan R. Gair

TL;DR

The paper tackles the rising computational burden of gravitational-wave population analyses by proposing a two-stage framework: first perform a data-driven, non-parametric reconstruction of the BBH population using a Dirichlet-process approach, then remap this reconstruction onto posterior distributions for parametric, informed models. A third hierarchical level enables this remapping, yielding both parameter posteriors and a quantitative goodness-of-fit measure via the regularised concentration β, with β signaling how well a model matches the non-parametric reconstruction. The method is instantiated in two DP-based implementations—an unweighted remapping and a flexible binning approach with RJMCMC—and validated on Gaussian and Power-law+Peak toy models, including GWTC-3-like scenarios, showing convergence to direct-inference results as the number of events grows. This framework enables efficient cross-model comparisons and principled model selection, offering a scalable path for analyzing large GW catalogs and potentially extending to unnormalised population functions as well as other agnostic population studies. The practical impact lies in dramatically reducing per-model computation while preserving statistical integrity and providing an interpretable goodness-of-fit metric beyond traditional Bayes factors.

Abstract

Gravitational-wave astronomy has entered a regime where it can extract information about the population properties of the observed binary black holes. The steep increase in the number of detections will offer deeper insights, but it will also significantly raise the computational cost of testing multiple models. To address this challenge, we propose a procedure that first performs a non-parametric (data-driven) reconstruction of the underlying distribution, and then remaps these results onto a posterior for the parameters of a parametric (informed) model. The computational cost is primarily absorbed by the initial non-parametric step, while the remapping procedure is both significantly easier to perform and computationally cheaper. In addition to yielding the posterior distribution of the model parameters, this method also provides a measure of the model's goodness-of-fit, opening for a new quantitative comparison across models.

Trust the process: mapping data-driven reconstructions to informed models using stochastic processes

TL;DR

The paper tackles the rising computational burden of gravitational-wave population analyses by proposing a two-stage framework: first perform a data-driven, non-parametric reconstruction of the BBH population using a Dirichlet-process approach, then remap this reconstruction onto posterior distributions for parametric, informed models. A third hierarchical level enables this remapping, yielding both parameter posteriors and a quantitative goodness-of-fit measure via the regularised concentration β, with β signaling how well a model matches the non-parametric reconstruction. The method is instantiated in two DP-based implementations—an unweighted remapping and a flexible binning approach with RJMCMC—and validated on Gaussian and Power-law+Peak toy models, including GWTC-3-like scenarios, showing convergence to direct-inference results as the number of events grows. This framework enables efficient cross-model comparisons and principled model selection, offering a scalable path for analyzing large GW catalogs and potentially extending to unnormalised population functions as well as other agnostic population studies. The practical impact lies in dramatically reducing per-model computation while preserving statistical integrity and providing an interpretable goodness-of-fit metric beyond traditional Bayes factors.

Abstract

Gravitational-wave astronomy has entered a regime where it can extract information about the population properties of the observed binary black holes. The steep increase in the number of detections will offer deeper insights, but it will also significantly raise the computational cost of testing multiple models. To address this challenge, we propose a procedure that first performs a non-parametric (data-driven) reconstruction of the underlying distribution, and then remaps these results onto a posterior for the parameters of a parametric (informed) model. The computational cost is primarily absorbed by the initial non-parametric step, while the remapping procedure is both significantly easier to perform and computationally cheaper. In addition to yielding the posterior distribution of the model parameters, this method also provides a measure of the model's goodness-of-fit, opening for a new quantitative comparison across models.

Paper Structure

This paper contains 18 sections, 49 equations, 8 figures.

Figures (8)

  • Figure 1: Posterior on $\mu$, $\sigma$, $\log_{10}(\alpha)$ and $\beta$ as a function of the number of bins $N_b$. The contours show the $90\%$ confidence regions and the black cross-hairs indicate the true parameters of the Gaussian distribution used to generate data. The posterior on $\alpha$ drifts with the number of bins in such a way that the posterior on $\beta$ remains centred on the same position.
  • Figure 2: Comparison between the non-parametric reconstructions (shaded areas, 90% credible region), the remapped Gaussian distributions (dashed lines, median and 90% credible region) and the true probability density function (solid black line), along with the histogram of the simulated data and the result obtained by direct inference (dot-dashed green lines, median and 90% credible region). Left panel refers to the unweighted remapping approach, right panel to the flexible binning method.
  • Figure 3: Posterior distributions obtained via remapping onto a Gaussian (left) and Cauchy (right) distribution using the two approaches described in this work and via direct inference. The contours show the $68$, $90$ and $95\%$ confidence regions and the black cross-hairs mark the true values of $\mu$ and $\sigma$ for the Gaussian case.
  • Figure 4: Summary of the $\beta$ values obtained for different target models when remapping from a flexible fit to data generated from a Gaussian distribution. Midpoints and bar ends show the median values and $90\%$ symmetric credible intervals of the $\beta$ posteriors for each model.
  • Figure 5: Comparison of the non-parametric reconstructions (shaded areas, 90% credible region), the remapped Power-law+Peak distributions (dashed lines, median and 90% credible region) and the true probability density function (solid black line), alongside with the histogram of the simulated data and the result obtained by direct inference (dot-dashed green lines, median and 90% credible region). The left column refer to the unweighted remapping approach, the right one to the flexible binning method.
  • ...and 3 more figures