Table of Contents
Fetching ...

A Unified Heterogeneous Implementation of Numerical Atomic Orbitals-Based Real-Time TDDFT within the ABACUS Package

Taoni Bao, Yuanbo Li, Zichao Deng, Haotian Zhao, Denghui Lu, Yike Huang, Chao Lian, Lixin He, Mohan Chen

Abstract

We present a unified heterogeneous computing framework for real-time time-dependent density functional theory (RT-TDDFT) based on numerical atomic orbitals (NAOs), implemented in the ABACUS package. We introduce three co-designed abstraction layers, including unified data containers, unified linear algebra operators, and unified grid integration interfaces. These layers collectively accelerate the two most demanding parts of NAO-based RT-TDDFT: explicit real-time wavefunction propagation and real-space grid operations such as Hamiltonian construction and force evaluation under external fields. We validate the method by computing optical properties for systems ranging from finite molecules to periodic solids, showing excellent agreement with standard benchmarks. Performance evaluations on bulk silicon demonstrate that a single GPU can achieve substantial wall-clock speedup over a fully utilized dual-socket CPU node. Furthermore, distributed multi-GPU strong-scaling tests confirm high parallel efficiency over tens of GPUs. This work establishes a high-performance, portable platform for large-scale first-principles simulations of ultrafast electron dynamics.

A Unified Heterogeneous Implementation of Numerical Atomic Orbitals-Based Real-Time TDDFT within the ABACUS Package

Abstract

We present a unified heterogeneous computing framework for real-time time-dependent density functional theory (RT-TDDFT) based on numerical atomic orbitals (NAOs), implemented in the ABACUS package. We introduce three co-designed abstraction layers, including unified data containers, unified linear algebra operators, and unified grid integration interfaces. These layers collectively accelerate the two most demanding parts of NAO-based RT-TDDFT: explicit real-time wavefunction propagation and real-space grid operations such as Hamiltonian construction and force evaluation under external fields. We validate the method by computing optical properties for systems ranging from finite molecules to periodic solids, showing excellent agreement with standard benchmarks. Performance evaluations on bulk silicon demonstrate that a single GPU can achieve substantial wall-clock speedup over a fully utilized dual-socket CPU node. Furthermore, distributed multi-GPU strong-scaling tests confirm high parallel efficiency over tens of GPUs. This work establishes a high-performance, portable platform for large-scale first-principles simulations of ultrafast electron dynamics.
Paper Structure (26 sections, 37 equations, 15 figures, 2 tables)

This paper contains 26 sections, 37 equations, 15 figures, 2 tables.

Figures (15)

  • Figure 1: Schematic architecture of the heterogeneous RT-TDDFT implementation in ABACUS, organized into three logical layers: (a) The User Layer handles standard inputs (e.g., structure, numerical atomic orbitals, pseudopotentials) and outputs physical properties such as optical responses. (b) The Algorithm Developer Layer illustrates the main RT-TDDFT workflow, including the self-consistent time-evolution loop, Hamiltonian construction, wavefunction propagation, and Ehrenfest dynamics updates. (c) The Core Underlying Heterogeneous Abstraction Layer underpins the simulation with unified data containers, grid integration interfaces, and portable linear algebra operators, bridging the physics algorithms with diverse hardware backends.
  • Figure 2: Data flow and architectural diagram of the Gint module. The workflow begins with the initialization of the GintInfo class (green block), which manages geometry, grid division, and atom-grid neighbor relationships. The core computation iterates over local grid blocks (BigGrid), utilizing the PhiOperator to evaluate atomic orbitals and unified calculation kernels to construct physical quantities such as density and Hamiltonian matrices. Finally, data is aggregated via MPI and converted to the target storage format.
  • Figure 3: Optical properties of the anthracene molecule. (a) Time-dependent external electric field profile. (b) Time evolution of the induced dipole moment, demonstrating consistency between CPU and GPU calculations. (c) Absorption spectra calculated using length, velocity, and hybrid gauges with the QZTP basis. (d) Basis set convergence test. (e) Comparison with DGDFT benchmark results acs.jctc.8b00580.
  • Figure 4: Optical properties of the bare (CdSe)6 cluster. (a) Time-dependent external electric field profile. (b) Time evolution of the induced dipole moment, demonstrating consistency between CPU and GPU calculations. (c) Absorption spectra obtained using length, velocity, and hybrid gauges, compared with CP2K benchmark results Nadler2013.
  • Figure 5: Optical properties of the periodic hydrogen chain. (a) Time-dependent external electric field profile. (b) Time evolution of the macroscopic current density, demonstrating consistency between CPU and GPU calculations. (c) Absorption spectra computed using velocity and hybrid gauges, compared with Qbox benchmark results jcp.5.0211238.
  • ...and 10 more figures