Table of Contents
Fetching ...

An algorithm for atom-centered lossy compression of the atomic orbital basis in density functional theory calculations

Anthony O. Lara, Justin J. Talbot, Zhe Wang, Martin Head-Gordon

TL;DR

This paper tackles the computational burden of approaching the complete basis set limit in DFT by introducing an atom-centered compression scheme based on natural atomic orbitals (NAOs) derived from atomic blocks of the density matrix in a one-center orthogonalized representation. NAOs provide a physically meaningful, localized basis, enabling a single occupation-threshold to prune insignificantly occupied functions while preserving accuracy; in HF tests with a QZ pc-3 basis, thresholds around $10^{-5}$ yield compression factors of $2.5$–$4.5$ and energy errors typically below $0.1$ kcal/mol, with tighter thresholds ($10^{-7}$) achieving errors near $0.01$ kcal/mol and compression around $2$–$2.5$. The results show that larger basis sets yield greater compressibility, that relative energies can be robust to compression due to error cancellation in many cases, and that the method provides a practical pathway to accelerate SCF calculations in large-basis regimes, with future work on dual-basis corrections and exploiting compression to speed up linear-algebra and RI-based steps. Overall, the work offers a proof of concept for controllable AO-basis compression that preserves essential electronic structure information while enabling substantial reductions in basis size and computational effort.

Abstract

Large atomic-orbital (AO) basis sets of at least triple and preferably quadruple-zeta (QZ) size are required to adequately converge Kohn-Sham density functional theory (DFT) calculations towards the complete basis set limit. However, incrementing the cardinal number by one nearly doubles the AO basis dimension, and the computational cost scales as the cube of the AO dimension, so this is very computationally demanding. In this work, we develop and test a natural atomic orbital (NAO) scheme in which the NAOs are obtained as eigenfunctions of atomic blocks of the density matrix in a one-center orthogonalized representation. The NAO representation enables one-center compression of the AO basis in a manner that is optimal for a given threshold, by discarding NAOs with occupation numbers below that threshold. Extensive tests using the Hartree-Fock functional suggest that a threshold of $10^{-5}$ can yield a compression factor (ratio of AO to compressed NAO dimension) between 2.5 and 4.5 for the QZ pc-3 basis. The errors in relative energies are typically less than 0.1 kcal/mol when the compressed basis is used instead of the uncompressed basis. Between 10 and 100 times smaller errors (i.e., usually less than 0.01 kcal/mol) can be obtained with a threshold $10^{-7}$, while the compression factor is typically between 2 and 2.5.

An algorithm for atom-centered lossy compression of the atomic orbital basis in density functional theory calculations

TL;DR

This paper tackles the computational burden of approaching the complete basis set limit in DFT by introducing an atom-centered compression scheme based on natural atomic orbitals (NAOs) derived from atomic blocks of the density matrix in a one-center orthogonalized representation. NAOs provide a physically meaningful, localized basis, enabling a single occupation-threshold to prune insignificantly occupied functions while preserving accuracy; in HF tests with a QZ pc-3 basis, thresholds around yield compression factors of and energy errors typically below kcal/mol, with tighter thresholds () achieving errors near kcal/mol and compression around . The results show that larger basis sets yield greater compressibility, that relative energies can be robust to compression due to error cancellation in many cases, and that the method provides a practical pathway to accelerate SCF calculations in large-basis regimes, with future work on dual-basis corrections and exploiting compression to speed up linear-algebra and RI-based steps. Overall, the work offers a proof of concept for controllable AO-basis compression that preserves essential electronic structure information while enabling substantial reductions in basis size and computational effort.

Abstract

Large atomic-orbital (AO) basis sets of at least triple and preferably quadruple-zeta (QZ) size are required to adequately converge Kohn-Sham density functional theory (DFT) calculations towards the complete basis set limit. However, incrementing the cardinal number by one nearly doubles the AO basis dimension, and the computational cost scales as the cube of the AO dimension, so this is very computationally demanding. In this work, we develop and test a natural atomic orbital (NAO) scheme in which the NAOs are obtained as eigenfunctions of atomic blocks of the density matrix in a one-center orthogonalized representation. The NAO representation enables one-center compression of the AO basis in a manner that is optimal for a given threshold, by discarding NAOs with occupation numbers below that threshold. Extensive tests using the Hartree-Fock functional suggest that a threshold of can yield a compression factor (ratio of AO to compressed NAO dimension) between 2.5 and 4.5 for the QZ pc-3 basis. The errors in relative energies are typically less than 0.1 kcal/mol when the compressed basis is used instead of the uncompressed basis. Between 10 and 100 times smaller errors (i.e., usually less than 0.01 kcal/mol) can be obtained with a threshold , while the compression factor is typically between 2 and 2.5.

Paper Structure

This paper contains 13 sections, 18 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Distribution of occupations for the NAOs of 30 chain hydrocarbon oligomers. (a) Occupation number decay for a C_30H_62, C_30H_32, and C_30H_2 versus basis ratio (i.e., the x-axis counts eigenvectors in order of occupation number). The inset plot shows a zoomed-in view of the largest occupation numbers for the minimal basis of C30H62. (b) Per atom basis ratios of C_30H_62 as a function of threshold $\left(10^{-\epsilon}\right)$, where the x axis is the atom index, with the 30 C atoms first, followed by the 62 H atoms.
  • Figure 2: The most strongly occupied NAOs for the innermost carbon $\left(\mathrm{C15}\right)$ and its attached hydrogen in C_30H_62. Orbitals (a), (b), and (c) correspond to the three most highly occupied C NAOs, listed in decreasing order of significance. NAOs (a) and (c) retain clear C(1s) and C(2p)-like shapes, while (b) appears as a polarized variant of the C(2s) orbital. For hydrogen (d), the dominant NAO is a slightly distorted version of atomic H(1s) AO. These molecular NAOs thus resemble the corresponding free atom AOs, with small environment-induced perturbations.
  • Figure 3: Decay of C_nH_2n+2 of increasing chain length as a function of basis ratio. Spectral decay for numerically significant occupation numbers. Small chains show remarkable compressibility, which asymptotically decays for the largest chain. The inset plot shows a minimal basis of significant occupation is recovered for each C_nH_2n+2. The stratification of the most significant occupation numbers is evident in both the smallest and the largest alkanes. Construction of the minimal basis is invariant to system size, while compressibility and corresponding occupation decay rate are dependent on system size. This size dependence asymptotes for moderately sized systems (chain size of about 10).
  • Figure 4: Compression factor as a function of threshold for bases or increasing cardinality for a C_30H_62. The smallest bases, pc-0 and pc-1, exhibit little to no compressibility for our working threshold range. The larger bases, pc-2, pc-3, and pc-4, increase compressibility and yield larger compression factors for our given range. The larger the basis, the greater the fraction of NAOs that are insignificant in describing the SCF energy and density.
  • Figure 5: Electron and absolute energy error for pc-2, pc-3, and pc-4 basis sets for C30H62. (a) Pre-SCF electron error as a function of threshold. The largest basis set, pc-4, exhibits the smallest error across the range, while the smallest basis, pc-2, generally has the most significant error. (b) Absolute energy error as a function of threshold. Much like the electron error, error is minimized for pc-4 and maximized for pc-2. The electron error and energy error exhibit nearly identical behavior with respect to the threshold.
  • ...and 4 more figures