Table of Contents
Fetching ...

Pushing spectral siren cosmology into the third-generation era: a blinded mock data challenge

Matteo Tagliazucchi, Michele Moresco, Alessandro Agapito, Michele Mancarella, Sarah Ferraiuolo, Simone Mastrogiovanni, Nicola Borghi, Francesco Pannarale, Daniele Bonacorsi

TL;DR

This paper tackles the challenge of performing spectral-siren cosmology with the large data volumes anticipated from third-generation gravitational-wave detectors by conducting a blinded mock data challenge that compares three public inference pipelines (ICAROGW, CHIMERA, pymcpop-gw). It shows that GPU-accelerated implementations can scale to ET-like catalogs, delivering consistent cosmological and population parameter constraints, including precise measurements of $H(z)$ at intermediate redshifts and meaningful joint constraints on $H_0$ and $Ω_{m,0}$. The work elucidates which GW events contribute most to constraining cosmology, highlighting that low-distance events and population-feature mass scales are especially informative, and it provides a validated framework for spectral-siren analyses in the ET era. It also outlines practical pathways to scale analyses across distributed GPUs and to incorporate more realistic, evolving population models in future forecasts.

Abstract

Gravitational wave (GW) spectral sirens offer a promising method for measuring cosmological parameters using GW data only - without relying on external redshift information such as electromagnetic counterparts or galaxy catalogs - by exploiting distributional features in the population of GW sources. The advent of third-generation detectors like the Einstein Telescope (ET) will provide catalogs three orders of magnitudes larger than current ones, raising questions about the scalability and robustness of existing inference pipelines. We present a blinded mock data challenge that tests three public pipelines with distinct numerical implementations, namely, $\texttt{ICAROGW}$, $\texttt{CHIMERA}$, and $\texttt{pymcpop-gw}$, on simulated ET observations containing the best $\mathcal{O}(10^4)$ binary black hole mergers that can be observed in 1 year. We assess their computational performance, validate their agreement in a blinded setting, and forecast cosmological constraints. We find that, thanks to GPU acceleration, these pipelines can process the events expected from ET within a manageable timeframe. All pipelines recover consistent cosmological and population parameters. Assuming a flat $Λ$CDM model, we measure $H(z)$ at $z\sim1.5$ with 2.4% precision, and achieve a mean precision on $H(z)$ of 2.8% across $0.7<z<1.8$ with a catalog of $\sim 12,000$ high-S/N events. This corresponds to joint constraints of $\sim 10%$ on $H_0$ and $\sim 26%$ on $Ω_{\rm m,0}$. We also identify the events that contribute mostly to constraining cosmological parameters, showing that low-distance sources near population features drive the constraining power on all cosmological parameters, while higher-distance events primarily constrain $Ω_{\rm m,0}$. Our results establish a validated, performance-tested framework for spectral siren cosmology in the era of third-generation GW observatories.

Pushing spectral siren cosmology into the third-generation era: a blinded mock data challenge

TL;DR

This paper tackles the challenge of performing spectral-siren cosmology with the large data volumes anticipated from third-generation gravitational-wave detectors by conducting a blinded mock data challenge that compares three public inference pipelines (ICAROGW, CHIMERA, pymcpop-gw). It shows that GPU-accelerated implementations can scale to ET-like catalogs, delivering consistent cosmological and population parameter constraints, including precise measurements of at intermediate redshifts and meaningful joint constraints on and . The work elucidates which GW events contribute most to constraining cosmology, highlighting that low-distance events and population-feature mass scales are especially informative, and it provides a validated framework for spectral-siren analyses in the ET era. It also outlines practical pathways to scale analyses across distributed GPUs and to incorporate more realistic, evolving population models in future forecasts.

Abstract

Gravitational wave (GW) spectral sirens offer a promising method for measuring cosmological parameters using GW data only - without relying on external redshift information such as electromagnetic counterparts or galaxy catalogs - by exploiting distributional features in the population of GW sources. The advent of third-generation detectors like the Einstein Telescope (ET) will provide catalogs three orders of magnitudes larger than current ones, raising questions about the scalability and robustness of existing inference pipelines. We present a blinded mock data challenge that tests three public pipelines with distinct numerical implementations, namely, , , and , on simulated ET observations containing the best binary black hole mergers that can be observed in 1 year. We assess their computational performance, validate their agreement in a blinded setting, and forecast cosmological constraints. We find that, thanks to GPU acceleration, these pipelines can process the events expected from ET within a manageable timeframe. All pipelines recover consistent cosmological and population parameters. Assuming a flat CDM model, we measure at with 2.4% precision, and achieve a mean precision on of 2.8% across with a catalog of high-S/N events. This corresponds to joint constraints of on and on . We also identify the events that contribute mostly to constraining cosmological parameters, showing that low-distance sources near population features drive the constraining power on all cosmological parameters, while higher-distance events primarily constrain . Our results establish a validated, performance-tested framework for spectral siren cosmology in the era of third-generation GW observatories.
Paper Structure (12 sections, 15 equations, 8 figures, 1 table)

This paper contains 12 sections, 15 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Properties of the blinded mock BBH catalogs. Left panel: Reverse cumulative distribution of the S/N for the generated population. Dashed lines indicate the S/N thresholds used to define the two catalogs. Middle and right panels: mass and redshift rate \ref{['eq:p-z']} distributions, respectively, for events in the two catalogs, compared with the fiducial population model (dashed lines).
  • Figure 2: Scaling of computational times for a single evaluation and for a complete population fit as a function of the number of events and the total number of data points $D = 3\times(N_{\rm obs}\times N_{\rm PE} + N_{\rm inj})$, where $N_{\rm PE}=5000$ per event and $N_{\rm inj} = N_{\rm obs}\times 390$. Markers connected by a solid line correspond to actual simulated catalogs with S/N cuts of 60 and 75, as labeled. Markers connected by a dashed line refer to timings obtained using projected catalogs constructed by stacking events and injections from the S/N > 60 catalog. The corresponding S/N cut for this number of events is indicated in the $x$-axis. Black vertical lines show the memory saturation limits for chimera. The gray bands indicate the approximate event volume expected for ET in one year of observation.
  • Figure 3: Cosmological and population parameter constraints from the S/N > 75 catalog. The corner plot shows 1D marginalized posteriors and 2D contours (68% and 95% credible regions) obtained with all three pipelines. The inset table provides median values and 68% C.I. for each parameter.
  • Figure 4: Comparison of constraints from the S/N > 60 and S/N > 75 catalogs. Left: 2D marginalized posterior for $(H_0, \Omega_{\rm m,0})$. In particular, we plot the 68% and 95% credible regions. Center and right: predictive posterior distributions for the primary mass spectrum and the redshift event rate \ref{['eq:p-z']}, respectively. Dashed lines indicate the blinded fiducial values and population model.
  • Figure 5: Top: The 1-$\sigma$ contours of the predictive posterior distribution for $H(z)/(1+z)$. Bottom: The relative precision on $H(z)$, computed as the width of the 1-$\sigma$ C.I. divided by twice the median of $H(z)$. Dashed vertical lines indicate the redshift at which the constraint on $H(z)$ is strongest.
  • ...and 3 more figures