Trust the process: mapping data-driven reconstructions to informed models using stochastic processes
Stefano Rinaldi, Alexandre Toubiana, Jonathan R. Gair
TL;DR
The paper tackles the rising computational burden of gravitational-wave population analyses by proposing a two-stage framework: first perform a data-driven, non-parametric reconstruction of the BBH population using a Dirichlet-process approach, then remap this reconstruction onto posterior distributions for parametric, informed models. A third hierarchical level enables this remapping, yielding both parameter posteriors and a quantitative goodness-of-fit measure via the regularised concentration β, with β signaling how well a model matches the non-parametric reconstruction. The method is instantiated in two DP-based implementations—an unweighted remapping and a flexible binning approach with RJMCMC—and validated on Gaussian and Power-law+Peak toy models, including GWTC-3-like scenarios, showing convergence to direct-inference results as the number of events grows. This framework enables efficient cross-model comparisons and principled model selection, offering a scalable path for analyzing large GW catalogs and potentially extending to unnormalised population functions as well as other agnostic population studies. The practical impact lies in dramatically reducing per-model computation while preserving statistical integrity and providing an interpretable goodness-of-fit metric beyond traditional Bayes factors.
Abstract
Gravitational-wave astronomy has entered a regime where it can extract information about the population properties of the observed binary black holes. The steep increase in the number of detections will offer deeper insights, but it will also significantly raise the computational cost of testing multiple models. To address this challenge, we propose a procedure that first performs a non-parametric (data-driven) reconstruction of the underlying distribution, and then remaps these results onto a posterior for the parameters of a parametric (informed) model. The computational cost is primarily absorbed by the initial non-parametric step, while the remapping procedure is both significantly easier to perform and computationally cheaper. In addition to yielding the posterior distribution of the model parameters, this method also provides a measure of the model's goodness-of-fit, opening for a new quantitative comparison across models.
