Table of Contents
Fetching ...

Unified MPI Parallelization of Wave Function Methods: iCIPT2 as a Showcase

Qingpeng Wang, Ning Zhang, Wenjian Liu

TL;DR

This work delivers a unified MPI parallelization framework for wave function methods in the MetaWave platform by casting each computational step into dynamically scheduled loops governed by a ghost process, enabling a transferable MPI template across methods. iCIPT2 is used as a showcase, achieving high parallel efficiencies (up to 94% for ENPT2 and 89% overall on 16 nodes) and enabling large active spaces through improved MVP and a semi-stochastic ENPT2 estimator. The results include benchmarks on cyclobutadiene automerization, benzene ground-state energies, and ozone potential energy surfaces, and reveal a power-law relationship between iCIPT2 error and the number of CSFs. The approach promises broad applicability to non-relativistic and relativistic wave function methods, with ongoing work to address memory bottlenecks via cross-node distribution of CI vectors and extend to relativistic counterparts.

Abstract

The integration of quantum chemical methods with high-performance computing is indispensable for handling large systems with modest accuracy or even small systems but with high accuracy. Continuing with the unified implementation of non-relativistic and relativistic wave functions methods within the MetaWave platform (J. Phys. Chem. A. 2025, 129, 5170), we present here a unified MPI parallelization of the methods by abstracting ever computational step of a method as a dynamically-scheduled loop via ghost process, followed by a global reduction of local results from each node. The algorithmic abstraction enables the use of a single MPI template in various steps of different methods. Taking iCIPT2 [J. Chem. Theory Comput. 2021, 17, 949] as a showcase, the parallel efficiencies achieve 94% and 89% on 16 nodes (1024 cores) for the perturbation and whole calculations, respectively. Further combined with an improved algorithm for the matrix-vector product in the matrix diagonalization and an orbital-configuration-based semi-stochastic estimator for the perturbation correction, this renders large active space calculations possible, so as to obtain benchmarks for the automerization of cyclobutadiene, ground state energy of benzene and potential energy profile of ozone. It is also shown that the error of iCIPT2 follows a power law with respect to the number of configuration state functions.

Unified MPI Parallelization of Wave Function Methods: iCIPT2 as a Showcase

TL;DR

This work delivers a unified MPI parallelization framework for wave function methods in the MetaWave platform by casting each computational step into dynamically scheduled loops governed by a ghost process, enabling a transferable MPI template across methods. iCIPT2 is used as a showcase, achieving high parallel efficiencies (up to 94% for ENPT2 and 89% overall on 16 nodes) and enabling large active spaces through improved MVP and a semi-stochastic ENPT2 estimator. The results include benchmarks on cyclobutadiene automerization, benzene ground-state energies, and ozone potential energy surfaces, and reveal a power-law relationship between iCIPT2 error and the number of CSFs. The approach promises broad applicability to non-relativistic and relativistic wave function methods, with ongoing work to address memory bottlenecks via cross-node distribution of CI vectors and extend to relativistic counterparts.

Abstract

The integration of quantum chemical methods with high-performance computing is indispensable for handling large systems with modest accuracy or even small systems but with high accuracy. Continuing with the unified implementation of non-relativistic and relativistic wave functions methods within the MetaWave platform (J. Phys. Chem. A. 2025, 129, 5170), we present here a unified MPI parallelization of the methods by abstracting ever computational step of a method as a dynamically-scheduled loop via ghost process, followed by a global reduction of local results from each node. The algorithmic abstraction enables the use of a single MPI template in various steps of different methods. Taking iCIPT2 [J. Chem. Theory Comput. 2021, 17, 949] as a showcase, the parallel efficiencies achieve 94% and 89% on 16 nodes (1024 cores) for the perturbation and whole calculations, respectively. Further combined with an improved algorithm for the matrix-vector product in the matrix diagonalization and an orbital-configuration-based semi-stochastic estimator for the perturbation correction, this renders large active space calculations possible, so as to obtain benchmarks for the automerization of cyclobutadiene, ground state energy of benzene and potential energy profile of ozone. It is also shown that the error of iCIPT2 follows a power law with respect to the number of configuration state functions.
Paper Structure (19 sections, 16 equations, 10 figures, 5 tables, 14 algorithms)

This paper contains 19 sections, 16 equations, 10 figures, 5 tables, 14 algorithms.

Figures (10)

  • Figure 1: 3D convergence problem of electronic structure theory
  • Figure 2: Process distribution in MetaWave.
  • Figure 3: Dynamic scheduling via ghost process.
  • Figure 4: Serialization module in MetaWave. (a) the memory distribution during serialization/deserialization process; (b) timeline of the serialization/deserialization process. In (b) black boxes are procedure requiring calling MPI subroutines.
  • Figure 5: Wall times (left) and speedups (right) for the frozen-core CAS(20e,272o)-iCIPT2[$C_{\text{min}}=5\times10^{-6}$]/aug-cc-pVTZ calculations of cyclobutadiene on up to 16 nodes (1024 cores). Each node is equipped with two Hygon 7285 CPUS (32 cores, 2.0 GHZ) and 512 GB DDR4 memory.
  • ...and 5 more figures