Table of Contents
Fetching ...

Quantum-Inspired Self-Attention in a Large Language Model

Nikita Kuznetsov, Niyaz Ismagilov, Ernesto Campos

TL;DR

A classical quantum-inspired self-attention mechanism called QISA is proposed and integrated into the full autoregressive language modeling pipeline of GPT-1, which is the first integration of this kind, as previous quantum self-attention mechanisms have been primarily tested on text classification.

Abstract

Recent advances in Natural Language Processing have been predominantly driven by transformer-based architectures, which rely heavily on self-attention mechanisms to model relationships between tokens in a sequence. Similarly, the field of Quantum Natural Language Processing, which seeks to leverage quantum principles to address challenges in language understanding and generation tasks, has seen the recent development of quantum self-attention mechanisms. We propose a classical quantum-inspired self-attention (QISA) mechanism and integrate it into the full autoregressive language modeling pipeline of GPT-1. To the best of our knowledge, this is the first integration of this kind, as previous quantum self-attention mechanisms have been primarily tested on text classification. In our experiments, QISA achieves better performance when compared to standard self-attention on the metrics character error rate ($15.5\times$ better), word error rate ($4.7 \times $) and cross-entropy loss ($13 \times$). This is achieved while only requiring a $ 2.6\times$ longer inference time.

Quantum-Inspired Self-Attention in a Large Language Model

TL;DR

A classical quantum-inspired self-attention mechanism called QISA is proposed and integrated into the full autoregressive language modeling pipeline of GPT-1, which is the first integration of this kind, as previous quantum self-attention mechanisms have been primarily tested on text classification.

Abstract

Recent advances in Natural Language Processing have been predominantly driven by transformer-based architectures, which rely heavily on self-attention mechanisms to model relationships between tokens in a sequence. Similarly, the field of Quantum Natural Language Processing, which seeks to leverage quantum principles to address challenges in language understanding and generation tasks, has seen the recent development of quantum self-attention mechanisms. We propose a classical quantum-inspired self-attention (QISA) mechanism and integrate it into the full autoregressive language modeling pipeline of GPT-1. To the best of our knowledge, this is the first integration of this kind, as previous quantum self-attention mechanisms have been primarily tested on text classification. In our experiments, QISA achieves better performance when compared to standard self-attention on the metrics character error rate ( better), word error rate () and cross-entropy loss (). This is achieved while only requiring a longer inference time.
Paper Structure (12 sections, 17 equations, 4 figures, 2 tables)

This paper contains 12 sections, 17 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Left: Standard transformer block structure used in language models. Right: Modified multi-head self-attention where the standard value layer is replaced with a quantum-inspired one.
  • Figure 2: One layer of the hardware efficient ansatz.
  • Figure 3: Cross-entropy versus iterations during the training of GPT-1 with CSA, QSANN (only for 1 head), QSANNv1, QSANNv2, and QISA. The top plot is for embedding size 4 and 1 head, the middle is for embedding size 16 and 1 head, and the bottom is for embedding size 16 and 4 heads.
  • Figure 4: Training and inference times for a batch size of 1024 on a single NVIDIA T4 GPU. All models use observables caching to increase their inference speeds.