Table of Contents
Fetching ...

Retentive Neural Quantum States: Efficient Ansätze for Ab Initio Quantum Chemistry

Oliver Knitter, Dan Zhao, James Stokes, Martin Ganahl, Stefan Leichenauer, Shravan Veerapaneni

TL;DR

The retentive network (RetNet), a recurrent alternative to transformers, is explored as an ansatz for solving electronic ground state problems in ab initio quantum chemistry, and its findings support the RetNet as a means of improving the time complexity of NQS without sacrificing accuracy.

Abstract

Neural-network quantum states (NQS) has emerged as a powerful application of quantum-inspired deep learning for variational Monte Carlo methods, offering a competitive alternative to existing techniques for identifying ground states of quantum problems. A significant advancement toward improving the practical scalability of NQS has been the incorporation of autoregressive models, most recently transformers, as variational ansatze. Transformers learn sequence information with greater expressiveness than recurrent models, but at the cost of increased time complexity with respect to sequence length. We explore the use of the retentive network (RetNet), a recurrent alternative to transformers, as an ansatz for solving electronic ground state problems in $\textit{ab initio}$ quantum chemistry. Unlike transformers, RetNets overcome this time complexity bottleneck by processing data in parallel during training, and recurrently during inference. We give a simple computational cost estimate of the RetNet and directly compare it with similar estimates for transformers, establishing a clear threshold ratio of problem-to-model size past which the RetNet's time complexity outperforms that of the transformer. Though this efficiency can comes at the expense of decreased expressiveness relative to the transformer, we overcome this gap through training strategies that leverage the autoregressive structure of the model -- namely, variational neural annealing. Our findings support the RetNet as a means of improving the time complexity of NQS without sacrificing accuracy. We provide further evidence that the ablative improvements of neural annealing extend beyond the RetNet architecture, suggesting it would serve as an effective general training strategy for autoregressive NQS.

Retentive Neural Quantum States: Efficient Ansätze for Ab Initio Quantum Chemistry

TL;DR

The retentive network (RetNet), a recurrent alternative to transformers, is explored as an ansatz for solving electronic ground state problems in ab initio quantum chemistry, and its findings support the RetNet as a means of improving the time complexity of NQS without sacrificing accuracy.

Abstract

Neural-network quantum states (NQS) has emerged as a powerful application of quantum-inspired deep learning for variational Monte Carlo methods, offering a competitive alternative to existing techniques for identifying ground states of quantum problems. A significant advancement toward improving the practical scalability of NQS has been the incorporation of autoregressive models, most recently transformers, as variational ansatze. Transformers learn sequence information with greater expressiveness than recurrent models, but at the cost of increased time complexity with respect to sequence length. We explore the use of the retentive network (RetNet), a recurrent alternative to transformers, as an ansatz for solving electronic ground state problems in quantum chemistry. Unlike transformers, RetNets overcome this time complexity bottleneck by processing data in parallel during training, and recurrently during inference. We give a simple computational cost estimate of the RetNet and directly compare it with similar estimates for transformers, establishing a clear threshold ratio of problem-to-model size past which the RetNet's time complexity outperforms that of the transformer. Though this efficiency can comes at the expense of decreased expressiveness relative to the transformer, we overcome this gap through training strategies that leverage the autoregressive structure of the model -- namely, variational neural annealing. Our findings support the RetNet as a means of improving the time complexity of NQS without sacrificing accuracy. We provide further evidence that the ablative improvements of neural annealing extend beyond the RetNet architecture, suggesting it would serve as an effective general training strategy for autoregressive NQS.

Paper Structure

This paper contains 13 sections, 18 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: (a) A diagram of the transformer architecture originally featured in vaswani2017attention reproduced with permission. (b) An NQS ansatz incorporating the RetNet architecture. We note the dual structure of the retention module, a proxy for attention that underpins the RetNet architecture sun2023retnet. Retention may be computed in both parallel (left) and recurrent (right) forms: this dual formulation is what allows the RetNet to achieve the same ease of training as transformers, while performing inference as quickly as RNNs.