Initial Conditions from Galaxies: Machine-Learning Subgrid Correction to Standard Reconstruction
Liam Parker, Adrian E. Bayer, Uros Seljak
TL;DR
We address reconstructing the primordial density from late-time biased tracers by coupling standard BAO reconstruction with a learned subgrid CNN correction that operates on small, manageable subvolumes and tiles across the full survey volume. The method is trained on Quijote halos and galaxy mocks in both real and redshift space, achieving improved cross-correlation with the true initial field and substantially tighter BAO constraints than standard reconstruction, across volume scales. Key contributions include the sliding-window CNN architecture, Fourier-space loss, robust transfer to larger volumes without retraining, and demonstrated resilience to HOD misspecification. The approach yields scalable, high-fidelity reconstruction that can enhance cosmological analyses for DESI-like surveys by capturing nonlinearities and bias without sacrificing large-scale accuracy.
Abstract
We present a hybrid method for reconstructing the primordial density from late-time halos and galaxies. Our approach involves two steps: (1) apply standard Baryon Acoustic Oscillation (BAO) reconstruction to recover the large-scale features in the primordial density field and (2) train a deep learning model to learn small-scale corrections on partitioned subgrids of the full volume. At inference, this correction is then convolved across the full survey volume, enabling scaling to large survey volumes. We train our method on both mock halo catalogs and mock galaxy catalogs in both configuration and redshift space from the Quijote $1(h^{-1}\,\mathrm{Gpc})^3$ simulation suite. When evaluated on held-out simulations, our combined approach significantly improves the reconstruction cross-correlation coefficient with the true initial density field and remains robust to moderate model misspecification. Additionally, we show that models trained on $1(h^{-1}\,\mathrm{Gpc})^3$ can be applied to larger boxes--e.g., $(3h^{-1}\,\mathrm{Gpc})^3$--without retraining. Finally, we perform a Fisher analysis on our method's recovery of the BAO peak, and find that it significantly improves the error on the acoustic scale relative to standard BAO reconstruction. Ultimately, this method robustly captures nonlinearities and bias without sacrificing large-scale accuracy, and its flexibility to handle arbitrarily large volumes without escalating computational requirements makes it especially promising for large-volume surveys like DESI.
