Table of Contents
Fetching ...

Molecular Quantum Transformer

Yuichi Kamata, Quoc Hoan Tran, Yasuhiro Endo, Hirotaka Oshima

TL;DR

Numerical demonstrations show that in calculating ground-state energies for H2, LiH, BeH2, and H4, MQT outperforms the classical Transformer, highlighting the promise of quantum effects in Transformer structures.

Abstract

The Transformer model, renowned for its powerful attention mechanism, has achieved state-of-the-art performance in various artificial intelligence tasks but faces challenges such as high computational cost and memory usage. Researchers are exploring quantum computing to enhance the Transformer's design, though it still shows limited success with classical data. With a growing focus on leveraging quantum machine learning for quantum data, particularly in quantum chemistry, we propose the Molecular Quantum Transformer (MQT) for modeling interactions in molecular quantum systems. By utilizing quantum circuits to implement the attention mechanism on the molecular configurations, MQT can efficiently calculate ground-state energies for all configurations. Numerical demonstrations show that in calculating ground-state energies for H2, LiH, BeH2, and H4, MQT outperforms the classical Transformer, highlighting the promise of quantum effects in Transformer structures. Furthermore, its pretraining capability on diverse molecular data facilitates the efficient learning of new molecules, extending its applicability to complex molecular systems with minimal additional effort. Our method offers an alternative to existing quantum algorithms for estimating ground-state energies, opening new avenues in quantum chemistry and materials science.

Molecular Quantum Transformer

TL;DR

Numerical demonstrations show that in calculating ground-state energies for H2, LiH, BeH2, and H4, MQT outperforms the classical Transformer, highlighting the promise of quantum effects in Transformer structures.

Abstract

The Transformer model, renowned for its powerful attention mechanism, has achieved state-of-the-art performance in various artificial intelligence tasks but faces challenges such as high computational cost and memory usage. Researchers are exploring quantum computing to enhance the Transformer's design, though it still shows limited success with classical data. With a growing focus on leveraging quantum machine learning for quantum data, particularly in quantum chemistry, we propose the Molecular Quantum Transformer (MQT) for modeling interactions in molecular quantum systems. By utilizing quantum circuits to implement the attention mechanism on the molecular configurations, MQT can efficiently calculate ground-state energies for all configurations. Numerical demonstrations show that in calculating ground-state energies for H2, LiH, BeH2, and H4, MQT outperforms the classical Transformer, highlighting the promise of quantum effects in Transformer structures. Furthermore, its pretraining capability on diverse molecular data facilitates the efficient learning of new molecules, extending its applicability to complex molecular systems with minimal additional effort. Our method offers an alternative to existing quantum algorithms for estimating ground-state energies, opening new avenues in quantum chemistry and materials science.

Paper Structure

This paper contains 20 sections, 6 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Overview of the Molecular Quantum Transformer (MQT) model for the ground-state energy calculation across various molecules and their configurations, and the comparison with traditional methods. (a) For each molecule A, B, C,… and its associated configuration $\boldsymbol{r}_1, \boldsymbol{r}_2, \boldsymbol{r}_3, \ldots$, the MQT receives a corresponding classical features sequence $\boldsymbol{x}^A(\boldsymbol{r}_1), \boldsymbol{x}^B(\boldsymbol{r}_2), \boldsymbol{x}^C(\boldsymbol{r}_3), \ldots$ through an embedding process. Leveraging a quantum attention mechanism, the MQT represents the complex interactions and correlations within the molecular system. The output of the MQT is a sequence of quantum states $\ket{\psi^A(\boldsymbol{r}_1)}, \ket{\psi^B(\boldsymbol{r}_2)}, \ket{\psi^C(\boldsymbol{r}_3)}, \ldots$, which reflects these correlations in a variational representation of the estimated ground states for $(A, \boldsymbol{r}_1), (B, \boldsymbol{r}_2), (C, \boldsymbol{r}_3), \ldots$, respectively. The corresponding Hamiltonian $H^A(\boldsymbol{r}_1), H^B(\boldsymbol{r}_2), H^C(\boldsymbol{r}_3), \ldots$, derived from quantum mechanics are transformed into measurable operators to be measured on $\ket{\psi^A(\boldsymbol{r}_1)}, \ket{\psi^B(\boldsymbol{r}_2)}, \ket{\psi^C(\boldsymbol{r}_3)}, \ldots$. During training, the optimization process adjusts the variational parameters in both the MQT and the embedding process to minimize the expectation value $\braket{H(\boldsymbol{r})}_{\ket{\psi{(\boldsymbol{r})}}}$ across various molecules and a range of $\boldsymbol{r}$ values. In the evaluation phase, given a molecule, the MQT can provide an estimator of ground-state energy $E(\boldsymbol{r})$ for any configuration $\boldsymbol{r}$. (b) In contrast, traditional methods such as VQE or QPE require an independent and computationally expensive solver for each molecule and configuration $\boldsymbol{r}$.
  • Figure 2: Structure of Molecular Quantum Transformer (MQT). Starting with a molecule defined by atomic symbols and nuclear coordinates, the molecular Hamiltonian $H$ is constructed in a qubit-based representation using $n_q$ qubits. The MQT tokenizes the electronic state into an $n\times m \times d_\textup{emb}$ feature matrix, where $d_\textup{emb}$ is the embedding dimension, $n$ and $m$ represent the number of electrons and nuclei, respectively. Tokens $elec_{i}\mathchar'-nucl_{j}$ ($i=1,\ldots,n$; $j=1,\ldots,m$) are processed by blocks $B_i$. Each block $B_i$ comprises $L$ layers, including an amplification module $N_p$ that scales features by proton number $N_p$ and a Quantum Transformer module with shared trainable parameters. Outputs are aggregated, mapped via an FC module to match the $n_q$-qubit state vector, and combined with the Hartree-Fock (HF) state to produce the final state through amplitude embedding. The expectation of the Hamiltonian measured on this final state is then minimized by optimizing the model's parameters.
  • Figure 3: Potential energy curves and estimation errors ($\Delta E$) in the inset plots for varying interatomic bond lengths in (a) $\textup{H}_{2}$ (b) LiH, (c) $\textup{BeH}_{2}$, and (d) $\textup{H}_{4}$ molecules using the quantum (red lines) and classical (dotted blue lines) Transformers. In the main plots, the averages of the exact results (gray line with circle markers), MQT, and classical methods over nine trials are shown, but they overlap within the displayed range.
  • Figure 4: Bar plot comparing the average ground-state energy estimation error of LiH over potential energy curves between the classical Transformer ($\overline{\Delta}_c$) and MQT ($\overline{\Delta}_q$) as a function of the token embedding dimension $d_\textup{emb}$ (4, 6, 8, 10, and 786). Error bars represent standard deviations. Teal-blue bars indicate classical results, while orange bars indicate quantum results, highlighting MQT’s consistently lower errors across $d_\textup{emb}$. The quantum result at $d_\textup{emb} = 786$ is not displayed due to computational resource limitations.
  • Figure 5: Energy estimation errors ($\Delta E$) in nine trials show MQT trained on LiH with a few data points (few-shot learning), comparing pretraining on $\textup{H}_{2}$, $\textup{BeH}_{2}$, and $\textup{H}_{4}$ (orange lines) versus no pretraining (dotted teal-blue lines) and zero-shot learning (green line with circle markers) across varying bond lengths. Here, zero-shot MQT is pretrained on $\textup{H}_{2}$, $\textup{BeH}_{2}$, and $\textup{H}_{4}$ but not fine-tuned on LiH. The pretrained MQT provides more accurate estimations in few-shot learning than MQT models relying solely on zero-shot learning or fine-tuning. It even outperforms the neural network-based meta-VQE cervera:2021:metaVQE when trained from scratch, showing reductions in average error by approximately 19% and 11%, respectively.
  • ...and 3 more figures