Table of Contents
Fetching ...

ELUCID-DESI I: A Parallel MPI Implementation of the Initial Condition Solver for Large-Scale Reconstruction Simulations

Wensheng Hong, Xiaohu Yang, Junde Li, Huiyuan Wang, Zhao Chen, Hong-Ming Zhu, Qingyang Li, Yizhou Gu, Youcai Zhang, Feng Shi, Jiaxin Han, Yu Yu, Zhongxu Zhai

TL;DR

This work delivers a scalable MPI-parallel implementation of the HMCMC-based initial-condition reconstruction framework for large-scale cosmological simulations, enabling ELUCID-DESI-scale runs by distributing memory across nodes via 3D domain decomposition and using a FastPM-forward model. A novel initial-density-field guess module dramatically reduces burn-in and total CPU time, making giga- to tera-particle reconstructions practical. Performance evaluations show near-linear scaling with problem size and strong memory efficiency, supporting planning for 8192^3 reconstructions in near-future surveys. The approach significantly lowers the computational barriers for constrained simulations tied to DESI, CSST, Euclid, and similar surveys, while preserving reconstruction fidelity on large scales and controllable smoothing on small scales.

Abstract

We present a highly scalable, MPI-parallelized framework for reconstructing the initial cosmic density field, designed to meet the computational demands of next-generation cosmological simulations, particularly the upcoming ELUCID-DESI simulation based on DESI BGS data. Building upon the Hamiltonian Monte Carlo approach and the FastPM solver, our code employs domain decomposition to efficiently distribute memory between nodes. Although communication overhead increases the per-step runtime of the MPI version by roughly a factor of eight relative to the shared-memory implementation, our scaling tests-spanning different particle numbers, core counts, and node layouts-show nearly linear scaling with respect to both the number of particles and the number of CPU cores. Furthermore, to significantly reduce computational costs during the initial burn-in phase, we introduce a novel ``guess'' module that rapidly generates a high-quality initial density field. The results of the simulation test confirm substantial efficiency gains: for $256^3$ particles, 53 steps ($\sim$54 CPU hours) are saved; for $1024^3$, 106 steps ($\sim$7500 CPU hours). The relative gain grows with the number of particles, rendering large-volume reconstructions computationally practical for upcoming surveys, including our planned ELUCID-DESI reconstruction simulation with $8192^3$ particles, with a rough estimation of 720 steps ($\sim$37,000,000 CPU hours).

ELUCID-DESI I: A Parallel MPI Implementation of the Initial Condition Solver for Large-Scale Reconstruction Simulations

TL;DR

This work delivers a scalable MPI-parallel implementation of the HMCMC-based initial-condition reconstruction framework for large-scale cosmological simulations, enabling ELUCID-DESI-scale runs by distributing memory across nodes via 3D domain decomposition and using a FastPM-forward model. A novel initial-density-field guess module dramatically reduces burn-in and total CPU time, making giga- to tera-particle reconstructions practical. Performance evaluations show near-linear scaling with problem size and strong memory efficiency, supporting planning for 8192^3 reconstructions in near-future surveys. The approach significantly lowers the computational barriers for constrained simulations tied to DESI, CSST, Euclid, and similar surveys, while preserving reconstruction fidelity on large scales and controllable smoothing on small scales.

Abstract

We present a highly scalable, MPI-parallelized framework for reconstructing the initial cosmic density field, designed to meet the computational demands of next-generation cosmological simulations, particularly the upcoming ELUCID-DESI simulation based on DESI BGS data. Building upon the Hamiltonian Monte Carlo approach and the FastPM solver, our code employs domain decomposition to efficiently distribute memory between nodes. Although communication overhead increases the per-step runtime of the MPI version by roughly a factor of eight relative to the shared-memory implementation, our scaling tests-spanning different particle numbers, core counts, and node layouts-show nearly linear scaling with respect to both the number of particles and the number of CPU cores. Furthermore, to significantly reduce computational costs during the initial burn-in phase, we introduce a novel ``guess'' module that rapidly generates a high-quality initial density field. The results of the simulation test confirm substantial efficiency gains: for particles, 53 steps (54 CPU hours) are saved; for , 106 steps (7500 CPU hours). The relative gain grows with the number of particles, rendering large-volume reconstructions computationally practical for upcoming surveys, including our planned ELUCID-DESI reconstruction simulation with particles, with a rough estimation of 720 steps (37,000,000 CPU hours).
Paper Structure (39 sections, 21 equations, 8 figures, 1 table)

This paper contains 39 sections, 21 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Schematic overview of the reconstruction pipeline. The workflow consists of two main stages. Left panel: Pre-processing stage, which prepares the necessary inputs: (a) reconstruction of the three-dimensional observed density field from survey data, and (b) calibration of the transfer function $T(k)$ to correct for inaccuracies in the fast particle-mesh forward model. Right panel: The core HMCMC reconstruction loop, which iteratively samples the initial density field. The process begins by (c) generating an initial condition guess—either (c.1) randomly or (c.2) via our novel guess module—then (d) evolves it forward in time using the particle-mesh solver. Then, (e) the code evaluates the likelihood by comparing the predicted final field with the observed input, and updates the initial conditions via Hamiltonian dynamics. (f)This loop repeats until a predefined number of samples are collected or a required reconstruction accuracy is achieved. (g)) The final reconstructed initial density field is outputted for subsequent studies.
  • Figure 2: Parallel scaling performance. (Left) Processor-number scaling for fixed particle numbers ($256^3$, $512^3$, $1024^3$ particles). Runtime per iteration is normalized to show parallel efficiency relative to ideal linear scaling (dashed line). Larger problems scale better. (Right) Particle-number scaling at fixed MPI process counts. Runtime scales nearly linearly with total particle count (dashed line). The flatter slopes for higher process counts originate from load imbalance at the smallest problem size used for normalization (see Section \ref{['sec:speed']}).
  • Figure 3: Memory usage per MPI process. (Left) Processor-number scaling: Memory per process for fixed reconstruction particle numbers ($256^3$, $512^3$, $1024^3$), normalized to show efficient reduction relative to ideal scaling (dashed line). (Right) Particle-number scaling: Memory per process as a function of total particle count, with the number of MPI processes held constant for each curve. All curves exhibit highly linear scaling, closely following the ideal dashed line, confirming predictable memory requirements essential for large-scale initial condition reconstruction (see Section \ref{['sec:memoryusage']}).
  • Figure 4: $\chi^2_{\omega}$ values as a function of the number of steps N. Results are shown for chains started from a random field (solid curves) and from the guess module (dashed curves) for three problem sizes ($256^3$, $1024^3$, $2048^3$). In all cases, the guess module delivers a better initial condition (lower starting $\chi^2_{\omega}$) and yields faster convergence. This speed-up becomes increasingly significant as the problem size grows.
  • Figure 5: Visual and quantitative assessment of a $256^3$ reconstruction. Top left: True initial density field at $z_{\mathrm{ini}}$. Top middle: Reconstructed initial density field. Top right: Relative error map, showing no coherent large-scale bias. Bottom left: Final density field at $z=0$ evolved from the true initial conditions (the reconstruction target). Middle right: Final density field at $z=0$ evolved from the reconstructed initial conditions. Bottom right: Phase-correlation coefficient $r(k)$ of Fourier phases. The blue line shows $C_p(k)$ between the smoothed final field ($\rho_{\mathrm{mod}}$) and the true final field. The orange line shows $C_p(k)$ between the reconstructed initial field and the true initial field. The green line shows $C_p(k)$ between the final field evolved from the reconstruction ($\rho_{\mathrm{rc}}$, after CIC assignment) and the true final field. The vertical dashed line marks the theoretical characteristic scale $k_c \approx 1.88/R_s^{0.94}$ from Wang_2013.
  • ...and 3 more figures