Table of Contents
Fetching ...

Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention

Ethan N. Evans, Matthew Cook, Zachary P. Bradshaw, Margarite L. LaBorde

TL;DR

This work presents a variational quantum circuit architecture named Self-Attention Sequential Quantum Transformer Channel (SASQuaTCh), which builds networks of qubits that perform analogous operations of the transformer network, namely the keystone self-attention operation, and leads to an exponential improvement in parameter complexity and run-time complexity over its classical counterpart.

Abstract

The recent exploding growth in size of state-of-the-art machine learning models highlights a well-known issue where exponential parameter growth, which has grown to trillions as in the case of the Generative Pre-trained Transformer (GPT), leads to training time and memory requirements which limit their advancement in the near term. The predominant models use the so-called transformer network and have a large field of applicability, including predicting text and images, classification, and even predicting solutions to the dynamics of physical systems. Here we present a variational quantum circuit architecture named Self-Attention Sequential Quantum Transformer Channel (SASQuaTCh), which builds networks of qubits that perform analogous operations of the transformer network, namely the keystone self-attention operation, and leads to an exponential improvement in parameter complexity and run-time complexity over its classical counterpart. Our approach leverages recent insights from kernel-based operator learning in the context of predicting spatiotemporal systems to represent deep layers of a vision transformer network using simple gate operations and a set of multi-dimensional quantum Fourier transforms. To validate our approach, we consider image classification tasks in simulation and with hardware, where with only 9 qubits and a handful of parameters we are able to simultaneously embed and classify a grayscale image of handwritten digits with high accuracy.

Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention

TL;DR

This work presents a variational quantum circuit architecture named Self-Attention Sequential Quantum Transformer Channel (SASQuaTCh), which builds networks of qubits that perform analogous operations of the transformer network, namely the keystone self-attention operation, and leads to an exponential improvement in parameter complexity and run-time complexity over its classical counterpart.

Abstract

The recent exploding growth in size of state-of-the-art machine learning models highlights a well-known issue where exponential parameter growth, which has grown to trillions as in the case of the Generative Pre-trained Transformer (GPT), leads to training time and memory requirements which limit their advancement in the near term. The predominant models use the so-called transformer network and have a large field of applicability, including predicting text and images, classification, and even predicting solutions to the dynamics of physical systems. Here we present a variational quantum circuit architecture named Self-Attention Sequential Quantum Transformer Channel (SASQuaTCh), which builds networks of qubits that perform analogous operations of the transformer network, namely the keystone self-attention operation, and leads to an exponential improvement in parameter complexity and run-time complexity over its classical counterpart. Our approach leverages recent insights from kernel-based operator learning in the context of predicting spatiotemporal systems to represent deep layers of a vision transformer network using simple gate operations and a set of multi-dimensional quantum Fourier transforms. To validate our approach, we consider image classification tasks in simulation and with hardware, where with only 9 qubits and a handful of parameters we are able to simultaneously embed and classify a grayscale image of handwritten digits with high accuracy.
Paper Structure (10 sections, 20 equations, 8 figures, 1 table)

This paper contains 10 sections, 20 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: The circuit applied to the context of classification, where a single readout qubit is used to classify a two-class problem.
  • Figure 2: The variational ansatz $U_{kernel}(\theta)$ used in the circuit. For $n$ qubits and $l$ layers, the parameters $\theta = [\alpha_1^1, \beta_1^1, \gamma_1^1, \dots, \alpha_n^l, \beta_n^l, \gamma_n^l]$ are updated as a part of the optimization routine.
  • Figure 3: The variational ansatz $U_{p}(\theta)$ used in the circuit to perform classification tasks. The readout qubit $| r \rangle$ is conditionally rotated controlling on the data qubits, and the parameters $\theta_i$, $i=1,\dots,4N$ are updated as a part of the optimization routine.
  • Figure 4: Examples of synthetically generated noisy images with horizontal or vertical lines. The images are greyscale, with color added for visualization purposes only.
  • Figure 5: Loss function as a function of training time (epochs), both training and validation losses are depicted for the synthetic data on simulated hardware.
  • ...and 3 more figures