Optimising Foreground Modelling for Global 21cm Cosmology with GPU-Accelerated Nested Sampling

Jacob L. Tutt; Peter H. Sims; Joe H. N. Pattison; Dominic J. Anstey; Samuel A. K. Leeney; Eloy de Lera Acedo

Optimising Foreground Modelling for Global 21cm Cosmology with GPU-Accelerated Nested Sampling

Jacob L. Tutt, Peter H. Sims, Joe H. N. Pattison, Dominic J. Anstey, Samuel A. K. Leeney, Eloy de Lera Acedo

Abstract

The global 21-cm signal provides a powerful probe of early-Universe astrophysics, but its detection is hindered by Galactic foregrounds that are orders of magnitude brighter than the signal and distortions introduced by beam chromaticity. These challenges require accurate foreground modelling, rigorous Bayesian model comparison, and robust validation frameworks. In this work, we substantially accelerate global 21-cm inference by exploiting GPU architectures, enabling likelihood evaluations to achieve near-constant wall-clock time across a wide range of model dimensionalities and data volumes. Combined with algorithmic parallelisation of Nested Sampling, this reduces the total inference runtime of this work from hundreds of CPU-years to approximately two GPU-days, corresponding to a cost reduction of over two orders of magnitude. Leveraging this capability, we advance the physically motivated forward-modelling approach, in which foregrounds are represented by a discrete set of sky regions by introducing a novel, observation-dependent sky-partitioning scheme that defines regions using the antenna beam-convolved sky power of a given observing window. We show that this scheme improves modelling performance in three ways: firstly, by enforcing a strictly nested region hierarchy that enables clear identification of the Occam penalty in the Bayesian evidence, facilitating principled optimisation of model complexity; secondly, by enabling more accurate recovery of spatially varying spectral indices, with posterior estimates centred within physically plausible ranges; and thirdly, by allowing complex foregrounds to be modelled for robust global 21-cm signal inference using substantially fewer parameters. Overall, this approach achieves validated recovery at lower region counts, corresponding to an approximate 40% reduction in foreground-model dimensionality.

Optimising Foreground Modelling for Global 21cm Cosmology with GPU-Accelerated Nested Sampling

Abstract

Paper Structure (38 sections, 22 equations, 14 figures, 4 tables, 2 algorithms)

This paper contains 38 sections, 22 equations, 14 figures, 4 tables, 2 algorithms.

Introduction
Bayesian Data Analysis Pipeline
Data Simulation
Physically Motivated Foreground Model
Bayesian Inference
Likelihood Function
Prior Distributions
Model Selection
GPU-Accelerated Nested Sampling
Acceleration Mechanism
Likelihood Parallelisation
Algorithmic Parallelisation
Sampler Hyperparameters
Performance Benchmarking
Likelihood Acceleration
...and 23 more sections

Figures (14)

Figure 1: Beam pattern for a conical log spiral antenna, shown as polar projections of the antenna directivity $D(\theta,\phi,\nu)$ in local altitude–azimuth coordinates ($\theta,\phi$) above the horizon ($\theta < 90^\circ$). The three panels show $D(\theta,\phi,\nu)$ at $\nu$ = 50, 125 and 200 MHz to demonstrate the chromatic structure of the beam.
Figure 2: A demonstration of the chromatic structure introduced by the coupling of the Galactic foregrounds and the beam, its degeneracy with global 21-cm signals, and how these effects can be accounted for with the REACH data analysis pipeline. Top panel: A simulated 1 hour time-averaged observation $d(\nu)$ from the REACH telescope in the Karoo Desert, South Africa at 00:00 01-10-2019 with a conical log spiral antenna. The data has a mock 21-cm signal and 0.025 mK of Gaussian noise injected. The inset shows residuals beyond a fitted smooth power-law. Middle panel: Reduced residual structure after subtracting a Bayesian nested sampling fit model produced by the REACH pipeline using 16 parametrised regions. Bottom panel: Examples of emulated mock signals from GlobalEMU Bevins_2021_globalemu, to demonstrate the success of beam-aware modelling suppressing residuals below the magnitude of expected global 21-cm signal.
Figure 3: Schematic of the GPU-accelerated, differentiable REACH Bayesian analysis pipeline. The parameterised forward model combines a global 21-cm signal $\boldsymbol{\theta}_{\text{21cm}}$, diffuse foreground emission $\boldsymbol{\theta}_{\text{FG}}$ and horizon contamination $\theta_{\text{Horizon}}$ with the antenna's beam to generate an antenna-temperature spectrum $\mathbf{M}(\boldsymbol{\theta},t,\nu)$. This model can be statistically compared to the observational data $\mathbf{D}(t,\nu)$ under a specific noise structure through a likelihood function $\mathcal{L}(\mathbf{D}\,|\,\mathbf{M},\boldsymbol{\theta})$. The inference process is optimised through JAX’s XLA compilation jax2018github, leveraging gradient-based BlackJAX samplers cabezas2024blackjax and the Nested Slice Sampling (NSS) algorithm, implemented in the BlackJAX nested sampling framework yallup2025nested, for efficient Bayesian posterior $P(\boldsymbol{\theta} \mid \mathbf{D}, \mathbf{M})$ and evidence $\mathcal{Z}(\mathbf{D} \mid \mathbf{M})$ evaluations.
Figure 4: Performance benchmarking of the likelihood evaluation across varying model complexities. Top: Comparison of mean execution time (ms) for 1000 likelihood calls on an Intel Cascade Lake CPU (with and without JIT compilation) versus an NVIDIA A100 GPU. Bottom: The resulting speed-up factor of the A100 implementation relative to both JIT and non-JIT CPU baselines.
Figure 5: Performance benchmarking of the likelihood evaluation across varying data volumes. Top: Comparison of mean execution time (ms) for 1000 likelihood calls on an Intel Cascade Lake CPU (with JIT compilation) versus an NVIDIA A100 GPU, with discontinuity highlighted in by red line (dashed/solid). Bottom: The resulting speed-up factor of the A100 implementation relative to the JIT CPU baselines.
...and 9 more figures

Optimising Foreground Modelling for Global 21cm Cosmology with GPU-Accelerated Nested Sampling

Abstract

Optimising Foreground Modelling for Global 21cm Cosmology with GPU-Accelerated Nested Sampling

Authors

Abstract

Table of Contents

Figures (14)