CosmoUiT: A Vision Transformer-UNet Hybrid for Fast and Accurate Emulation of 21-cm Maps from the Epoch of Reionization
Prasad Rajesh Posture, Yashrajsinh Mahida, Suman Majumdar, Leon Noble
TL;DR
CosmoUiT addresses the challenge of rapidly generating accurate, field-level 3D 21-cm maps of the Epoch of Reionization by integrating a Vision Transformer encoder with a UNet, conditioning the output on reionization parameters $(M_{(h,min)}, N_{ion}, R_{mfp})$ and inputs from dark matter and halo density fields. The model captures both large-scale topology and small-scale features, achieving voxel-wise fidelity (MSE, $R^2$, SSIM) comparable to detailed simulations while offering orders-of-magnitude speedups suitable for parameter inference pipelines. Extensive evaluation shows strong performance on $x_{\mathrm{HI}}$ and $\delta T_b$ fields, including robust out-of-domain generalization to unseen initial seeds, though boundary fuzziness introduces some bias in bubble sizes and small-scale power. The work lays a foundation for fast, field-level inference with upcoming SKA data, and outlines concrete avenues to incorporate redshift evolution, uncertainties, and observational systematics for fully realistic forecasts.
Abstract
The observation of the redshifted 21-cm signal from the intergalactic medium will probe the epoch of reionization (EoR) with unprecedented detail. Various simulations are being developed and used to predict and understand the nature and morphology of this signal. However, these simulations are computationally very expensive and time-consuming to produce in large numbers. To overcome this problem, an efficient field-level emulator of this signal is required. However, the EoR 21-cm signal is highly non-Gaussian; therefore, capturing the correlations between different scales of this signal, which is directly related to the evolution of the reionization, with the neural network is quite difficult. Here, we introduce CosmoUiT, a UNet integrated vision transformer-based architecture, to overcome these difficulties. CosmoUiT emulates the 3D cubes of 21-cm signal from the EoR, for a given input dark matter density field, halo density field, and reionization parameters. CosmoUiT uses the multi-head self-attention mechanism of the transformer to capture the long-range dependencies and convolutional layers in the UNet to capture the small-scale variations in the target 21-cm field. Furthermore, the training of the emulator is conditioned on the input reionization parameters such that it gives a fast and accurate prediction of the 21-cm field for different sets of input reionization parameters. We evaluate the predictions of our emulator by comparing various statistics (e.g., bubble size distribution, power spectrum) and morphological features of the emulated and simulated maps. We further demonstrate that this vision transformer-based architecture can emulate the entire 3D 21-cm signal cube with high accuracy at both large and small scales.
