Table of Contents
Fetching ...

Transformer-Based Neural Networks Backflow for Strongly Correlated Electronic Structure

Huan Ma, Bowen Kan, Honghui Shang, Jinlong Yang

TL;DR

Transformer-based backflow is established as a powerful variational ansatz for strongly correlated electronic structure, achieving superior magnetic property predictions while maintaining chemical accuracy in total energies.

Abstract

Solving the electronic Schrödinger equation for strongly correlated systems remains one of the grand challenges in quantum chemistry. Here we demonstrate that Transformer architectures can be adapted to capture the complex grammar of electronic correlations through neural network backflow. In this approach, electronic configurations are processed as token sequences, where attention layers learn non-local orbital correlations and token-specific neural networks map these contextual representations into backflowed orbitals. Application to strongly correlated iron-sulfur clusters validates our approach: for $\left[\mathrm{Fe}_2 \mathrm{~S}_2\left(\mathrm{SCH}_3\right)_4\right]^{2-}$ ([2Fe-2S]) (30e,20o), the ground-state energy within chemical accuracy of DMRG while predicting magnetic exchange coupling constants closer to experimental values than all compared methods including DMRG, CCSD(T), and recent neural network approaches. For $\left[\mathrm{Fe}_4 \mathrm{S}_4\left(\mathrm{SCH}_3\right)_4\right]^{2-}$ ([4Fe-4S]) (54e,36o), we match DMRG energies and accurately reproduce detailed spin-spin correlation patterns between all Fe centers. The approach scales favorably to large active spaces inaccessible to exact methods, with distributed VMC optimization enabling stable convergence. These results establish Transformer-based backflow as a powerful variational ansatz for strongly correlated electronic structure, achieving superior magnetic property predictions while maintaining chemical accuracy in total energies.

Transformer-Based Neural Networks Backflow for Strongly Correlated Electronic Structure

TL;DR

Transformer-based backflow is established as a powerful variational ansatz for strongly correlated electronic structure, achieving superior magnetic property predictions while maintaining chemical accuracy in total energies.

Abstract

Solving the electronic Schrödinger equation for strongly correlated systems remains one of the grand challenges in quantum chemistry. Here we demonstrate that Transformer architectures can be adapted to capture the complex grammar of electronic correlations through neural network backflow. In this approach, electronic configurations are processed as token sequences, where attention layers learn non-local orbital correlations and token-specific neural networks map these contextual representations into backflowed orbitals. Application to strongly correlated iron-sulfur clusters validates our approach: for ([2Fe-2S]) (30e,20o), the ground-state energy within chemical accuracy of DMRG while predicting magnetic exchange coupling constants closer to experimental values than all compared methods including DMRG, CCSD(T), and recent neural network approaches. For ([4Fe-4S]) (54e,36o), we match DMRG energies and accurately reproduce detailed spin-spin correlation patterns between all Fe centers. The approach scales favorably to large active spaces inaccessible to exact methods, with distributed VMC optimization enabling stable convergence. These results establish Transformer-based backflow as a powerful variational ansatz for strongly correlated electronic structure, achieving superior magnetic property predictions while maintaining chemical accuracy in total energies.

Paper Structure

This paper contains 12 sections, 37 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Schematic computational workflow of QiankunNet . The procedure begins with a binary electron configuration string segmented into tokens representing local spin-orbital groups. Each token is embedded into a feature vector via a learnable lookup table (embedding layer), then processed through Transformer encoder layers to capture long-range electron correlations. The resulting features are mapped by token-specific MLPs to construct the single-particle orbital (SPO) matrix blocks, which are assembled into $D$ distinct SPO matrices $\tilde{A}_{\theta}^{(d)}(\mathbf{x})$. For each configuration $\mathbf{x}$, occupied orbitals are selected to form $N_e \times N_e$ matrices, and their determinants are summed to yield the final wavefunction $\psi(\mathbf{x})$.
  • Figure 2: Calculations for the [2Fe-2S] complex $\left[\mathrm{Fe}_2 \mathrm{S}_2\left(\mathrm{SCH}_3\right)_4\right]^{2-}$ using a CAS(30e,20o) active space. The plotted energies represent the average values computed over the final 1000 steps of the training process. The green shaded area denotes the region within chemical accuracy (1 kcal/mol). (a) Optimization of the ground state energy using QiankunNet, compared with DMRG result. (b) Optimization of different spin states, and their corresponding energy gap. (c) Magnetic exchange coupling constants $J$ derived from QiankunNet, in Comparison to Experimental Values and Other Computational Results.
  • Figure 3: Calculations for the [4Fe-4S] complex $\left[\mathrm{Fe}_4 \mathrm{~S}_4\left(\mathrm{SCH}_3\right)_4\right]^{2-}$. (a) QiankunNet optimization for ground state of [4Fe-4S] with an active space of CAS(54e,36o). Ensemble averages of the energies were obtained from the last 1000 steps of the training trajectory. Chemical accuracy (1 kcal/mol) is demarcated by the green shaded region. (b) Calculated spin correlation between 4 iron atoms.
  • Figure 4: Schematic of the all-to-all collective communication used to transform the $\mathbf{O}$ matrix from a row-wise parallel to a column-wise parallel distribution across GPUs.
  • Figure 5: Multi-core local energy computation based on sampled and deterministically obtained coupled states. Each core processes a separate batch of configurations, focusing exclusively on coupling calculations and Hamiltonian matrix elements for its assigned configurations.