ELUCID-DESI I: A Parallel MPI Implementation of the Initial Condition Solver for Large-Scale Reconstruction Simulations
Wensheng Hong, Xiaohu Yang, Junde Li, Huiyuan Wang, Zhao Chen, Hong-Ming Zhu, Qingyang Li, Yizhou Gu, Youcai Zhang, Feng Shi, Jiaxin Han, Yu Yu, Zhongxu Zhai
TL;DR
This work delivers a scalable MPI-parallel implementation of the HMCMC-based initial-condition reconstruction framework for large-scale cosmological simulations, enabling ELUCID-DESI-scale runs by distributing memory across nodes via 3D domain decomposition and using a FastPM-forward model. A novel initial-density-field guess module dramatically reduces burn-in and total CPU time, making giga- to tera-particle reconstructions practical. Performance evaluations show near-linear scaling with problem size and strong memory efficiency, supporting planning for 8192^3 reconstructions in near-future surveys. The approach significantly lowers the computational barriers for constrained simulations tied to DESI, CSST, Euclid, and similar surveys, while preserving reconstruction fidelity on large scales and controllable smoothing on small scales.
Abstract
We present a highly scalable, MPI-parallelized framework for reconstructing the initial cosmic density field, designed to meet the computational demands of next-generation cosmological simulations, particularly the upcoming ELUCID-DESI simulation based on DESI BGS data. Building upon the Hamiltonian Monte Carlo approach and the FastPM solver, our code employs domain decomposition to efficiently distribute memory between nodes. Although communication overhead increases the per-step runtime of the MPI version by roughly a factor of eight relative to the shared-memory implementation, our scaling tests-spanning different particle numbers, core counts, and node layouts-show nearly linear scaling with respect to both the number of particles and the number of CPU cores. Furthermore, to significantly reduce computational costs during the initial burn-in phase, we introduce a novel ``guess'' module that rapidly generates a high-quality initial density field. The results of the simulation test confirm substantial efficiency gains: for $256^3$ particles, 53 steps ($\sim$54 CPU hours) are saved; for $1024^3$, 106 steps ($\sim$7500 CPU hours). The relative gain grows with the number of particles, rendering large-volume reconstructions computationally practical for upcoming surveys, including our planned ELUCID-DESI reconstruction simulation with $8192^3$ particles, with a rough estimation of 720 steps ($\sim$37,000,000 CPU hours).
