Table of Contents
Fetching ...

Attention-Based Foundation Model for Quantum States

Timothy Zaklama, Daniele Guerci, Liang Fu

TL;DR

The paper introduces Q-stage, an attention-based foundation model that learns a smooth map from Hamiltonian parameters to many-body ground-state wavefunctions by tokenizing Fock-basis configurations and conditionally injecting a parameter token. Trained on a small set of exact diagonalization states, the model generalizes across parameter space and reproduces ground-state energies and phase diagrams for the 1D and 2D t–V models with high fidelity, enabling fast inference across the Hamiltonian manifold. A key finding is that a single shared network can interpolate the wavefunction across ($V/t$, $N$) while equal-time observables follow post hoc from the learned wavefunction; limitations include difficulty with first-order transitions and excited states. The work demonstrates a scalable route to universal quantum-state representations that can extend to broader Hamiltonians and continuum settings, potentially enabling unified modeling of quantum matter across diverse systems.

Abstract

We present an attention-based foundation model architecture for learning and predicting quantum states across Hamiltonian parameters, system sizes, and physical systems. Using only basis configurations and physical parameters as inputs, our trained neural network is able to produce highly accurate ground state wavefunctions. For example, we build the phase diagram for the 2D square-lattice $t-V$ model with $N$ particles, from only 18 parameters $(V/t,N)$. Thus, our architecture provides a basis for building a universal foundation model for quantum matter.

Attention-Based Foundation Model for Quantum States

TL;DR

The paper introduces Q-stage, an attention-based foundation model that learns a smooth map from Hamiltonian parameters to many-body ground-state wavefunctions by tokenizing Fock-basis configurations and conditionally injecting a parameter token. Trained on a small set of exact diagonalization states, the model generalizes across parameter space and reproduces ground-state energies and phase diagrams for the 1D and 2D t–V models with high fidelity, enabling fast inference across the Hamiltonian manifold. A key finding is that a single shared network can interpolate the wavefunction across (, ) while equal-time observables follow post hoc from the learned wavefunction; limitations include difficulty with first-order transitions and excited states. The work demonstrates a scalable route to universal quantum-state representations that can extend to broader Hamiltonians and continuum settings, potentially enabling unified modeling of quantum matter across diverse systems.

Abstract

We present an attention-based foundation model architecture for learning and predicting quantum states across Hamiltonian parameters, system sizes, and physical systems. Using only basis configurations and physical parameters as inputs, our trained neural network is able to produce highly accurate ground state wavefunctions. For example, we build the phase diagram for the 2D square-lattice model with particles, from only 18 parameters . Thus, our architecture provides a basis for building a universal foundation model for quantum matter.

Paper Structure

This paper contains 18 sections, 26 equations, 10 figures.

Figures (10)

  • Figure 1: Architecture of the neural network wavefunction ansatz. Our model first embeds an input Fock state, then passes the token through a self attention block of finite depth, followed by a single cross attention block, whereby the token is finally pooled to the complex basis state output. The complex amplitude for each basis state is then aggregated into the wavefunction for a particular set of Hamiltonian parameters $\boldsymbol \lambda$, which are all compiled into loss function. Finally, gradient descent is performed using adamw optimizer, and the model finishes training once a sufficient average overlap is reached.
  • Figure 2: Loss and cumulative overlap during training for 2D square lattice ($L=16, N=8$). Despite training on a wide range of Hamiltonian parameters the loss function asymptotically approaches 0 (total fidelity approaches 100$\%$). The non-interacting ground state is the hardest to learn; nevertheless it is eventually fully learned and asymptotically approaches a fidelity of 1.
  • Figure 3: Overlap generalization over parameter space ($V/t$) for $L=16, N=8$, in 1D (top) and 2D (bottom). Average overlap is trained until 99.99$\%$ fidelity is reached. Generalization exceeds 99.5$\%$ fidelity for all cases tested.
  • Figure 4: Energy generalization over parameter space ($V/t$) for $L=16, N=8$ in 2D. Energy agrees to 2 digits and has a percent error of under 1.5$\%$ for all cases. There is no significant difference between training and test cases. The right panel highlights the small-$V/t$ regime, which harder to train but still reproduced with high accuracy.
  • Figure 5: Charge excitation energy heatmap of 2D $t$--$V$ model ($L=16$). Black stars denote training points. The entire phase diagram is accurately predicted from just a handful of training points.
  • ...and 5 more figures