Table of Contents
Fetching ...

Mixed-precision ab initio tensor network state methods adapted for NVIDIA Blackwell technology via emulated FP64 arithmetic

Cole Brower, Samuel Rodriguez Bernabeu, Jeff Hammond, John Gunnels, Sotiris S. Xanthea, Martin Ganahl, Andor Menczer, Örs Legeza

TL;DR

This analyis represents the first quantum chemistry evaluation of FP64 emulation for correlated calculations capable of achieving chemical accuracy and emulation based on fixed-point arithmetic and paves the way for the utilization of state-of-the-art Blackwell technology in tree-like tensor network state electronic structure calculations, opening new research directions in materials sciences and beyond.

Abstract

We report cutting-edge performance results via mixed-precision spin adapted ab initio Density Matrix Renormalization Group (DMRG) electronic structure calculations utilizing the Ozaki scheme for emulating FP64 arithmetic through the use of fixed-point compute resources. By approximating the underlying matrix and tensor algebra with operations on a modest number of fixed-point representatives (``slices''), we demonstrate on smaller benchmark systems and for the active compounds of the FeMoco and cytochrome P450 (CYP) enzymes with complete active space (CAS) sizes of up to 113 electrons in 76 orbitals [CAS(113, 76)] and 63 electrons in 58 orbitals [CAS(63, 58)], respectively, that the chemical accuracy can be reached with mixed-precision arithmetic. We also show that, due to its variational nature, DMRG provides an ideal tool to benchmark accuracy domains, as well as the performance of new hardware developments and related numerical libraries. Detailed numerical error analysis and performance assessment are also presented for subcomponents of the DMRG algebra by systematically interpolating between double- and pseudo-half-precision. Our analyis represents the first quantum chemistry evaluation of FP64 emulation for correlated calculations capable of achieving chemical accuracy and emulation based on fixed-point arithmetic, and it paves the way for the utilization of state-of-the-art Blackwell technology in tree-like tensor network state electronic structure calculations, opening new research directions in materials sciences and beyond.

Mixed-precision ab initio tensor network state methods adapted for NVIDIA Blackwell technology via emulated FP64 arithmetic

TL;DR

This analyis represents the first quantum chemistry evaluation of FP64 emulation for correlated calculations capable of achieving chemical accuracy and emulation based on fixed-point arithmetic and paves the way for the utilization of state-of-the-art Blackwell technology in tree-like tensor network state electronic structure calculations, opening new research directions in materials sciences and beyond.

Abstract

We report cutting-edge performance results via mixed-precision spin adapted ab initio Density Matrix Renormalization Group (DMRG) electronic structure calculations utilizing the Ozaki scheme for emulating FP64 arithmetic through the use of fixed-point compute resources. By approximating the underlying matrix and tensor algebra with operations on a modest number of fixed-point representatives (``slices''), we demonstrate on smaller benchmark systems and for the active compounds of the FeMoco and cytochrome P450 (CYP) enzymes with complete active space (CAS) sizes of up to 113 electrons in 76 orbitals [CAS(113, 76)] and 63 electrons in 58 orbitals [CAS(63, 58)], respectively, that the chemical accuracy can be reached with mixed-precision arithmetic. We also show that, due to its variational nature, DMRG provides an ideal tool to benchmark accuracy domains, as well as the performance of new hardware developments and related numerical libraries. Detailed numerical error analysis and performance assessment are also presented for subcomponents of the DMRG algebra by systematically interpolating between double- and pseudo-half-precision. Our analyis represents the first quantum chemistry evaluation of FP64 emulation for correlated calculations capable of achieving chemical accuracy and emulation based on fixed-point arithmetic, and it paves the way for the utilization of state-of-the-art Blackwell technology in tree-like tensor network state electronic structure calculations, opening new research directions in materials sciences and beyond.

Paper Structure

This paper contains 1 equation, 10 figures.

Figures (10)

  • Figure 1: Relative error of the ground state energy as a function of DMRG iteration steps for the F$_2$ molecule in a CAS(18,18) model space using $D_{SU(2)}=1024$ (left panel) and $D_{SU(2)}=8192$ (right panel) SU(2) multiplets for the non-emulated native FP64 limit and for various number of INT8 slices, $S\in\{6,4,3,2$} obtained on a DGX B200 system. The dashed line stands for the relative error of chemical accuracy.
  • Figure 2: Similar to Fig. \ref{['fig:egs_f2_m1024']} but for the nitrogen dimer at its equilibrium geometry in a CAS(14,28) model space using $D_{SU(2)}=1024$ (left panel) and $D_{SU(2)}=4096$ (right panel) SU(2) multiplets obtained on a DGX B200 system.
  • Figure 3: Shifted ground state energy for the spin-1/2 doublet state of the cytochrome P450 (CYP) enzymes with CAS(63,58) model space as a function of DMRG iteration using $D_{SU(2)}=2048$ SU(2) multiplets and $S\in\{4,6\}$ slices (left panel) and the absolute error measured with respect to the native FP64 data sets for $S=4$ (central panel) and $S=6$ (right panel) slices for various $D$ values obtained on a DGX B200 system.
  • Figure 4: Cumulative diagonalization time in minutes as a function of Lánczos steps for simulations discussed in Fig. \ref{['fig:egs_f2_m1024']} obtained on a DGX B200 system. In practice the performant mode (non-Eager mode), where the system decides when it is faster to run emulation, is used.
  • Figure 5: Benchmark results obtained via the SU(2) spin-adapted single node hybrid CPU plus multi-GPU DMRG calculations for the F$_2$ molecule on a CAS(18,18) orbital space Legeza-2003a, the N$_2$ molecule on a CAS(14,28) space Chan-2004b, FeMoco on CAS(54,54) Reiher-2017 and CAS(113,76) Li-2019 spaces, and P450 on CAS(63,58) Goings-2022. The solid lines correspond to calculations performed on a DGX B200 system via native FP64 precision, while dashed lines correspond to emulated performant mode. As a reference, the dotted lines trace the results obtained on a DGX H100 system Menczer-2024b. Numbers indicate the corresponding $U(1)$ bond dimension values, which are the same for the dotted, dashed, and the solid lines.
  • ...and 5 more figures