Table of Contents
Fetching ...

A Survey of Quantum Transformers: Architectures, Challenges and Outlooks

Hui Zhang, Qinglin Zhao, Mengchu Zhou, Li Feng, Dusit Niyato, Shenggen Zheng, Lin Chen

TL;DR

This paper surveys the emerging field of quantum Transformers, addressing how to fuse classical Transformer architectures with quantum computing. It distinguishes two main implementation paradigms—PQC-based methods suitable for NISQ devices and QLA-based approaches geared toward fault-tolerant quantum computing—and provides a fine-grained taxonomy of PQC-based subtypes (QKV-only mapping, quantum pairwise and holistic attention, and quantum-assisted optimization). The review consolidates architectural traits, empirical findings on small-scale quantum advantages, and a candid analysis of challenges such as complexity trade-offs, scalability, and trainability, offering proposed solutions and future directions. The work highlights both the potential of quantum Transformers to provide expressivity and speedups in specialized settings and the practical hurdles that must be overcome to realize scalable, real-world benefits.

Abstract

Quantum Transformers integrate the representational power of classical Transformers with the computational advantages of quantum computing. Since 2022, research in this area has rapidly expanded, giving rise to diverse technical paradigms and early applications. To address the growing need for consolidation, this paper presents the first comprehensive, systematic, and in-depth survey of quantum Transformer models. First, we delineate the research scope, focusing on improving Transformer parts with quantum methods, and introduce foundational concepts in classical Transformers and quantum machine learning. Then we organize existing studies into two main paradigms: PQC-based and QLA-based, with PQC-based paradigm further divided into QKV-only Quantum Mapping, Quantum Pairwise Attention, Quantum Holistic Attention. and Quantum-Assisted Optimization, analyzing their core mechanisms and architectural traits. We also summarize empirical results that demonstrate preliminary quantum advantages, especially on small-scale tasks or resource-constrained settings. Following this, we examine key technical challenges, such as complexity-resource trade-offs, scalability and generalization limitations, and trainability issues including barren plateaus, and provide potential solutions, including quantumizing classical transformer variants with lower complexity, hybrid designs, and improved optimization strategies. Finally, we propose several promising future directions, e.g., scaling quantum modules into large architectures, applying quantum Transformers to domains with inherently quantum data (e.g., physics, chemistry), and developing theory-driven designs grounded in quantum information science. This survey will help researchers and practitioners quickly grasp the overall landscape of current quantum Transformer research and promote future developments in this emerging field.

A Survey of Quantum Transformers: Architectures, Challenges and Outlooks

TL;DR

This paper surveys the emerging field of quantum Transformers, addressing how to fuse classical Transformer architectures with quantum computing. It distinguishes two main implementation paradigms—PQC-based methods suitable for NISQ devices and QLA-based approaches geared toward fault-tolerant quantum computing—and provides a fine-grained taxonomy of PQC-based subtypes (QKV-only mapping, quantum pairwise and holistic attention, and quantum-assisted optimization). The review consolidates architectural traits, empirical findings on small-scale quantum advantages, and a candid analysis of challenges such as complexity trade-offs, scalability, and trainability, offering proposed solutions and future directions. The work highlights both the potential of quantum Transformers to provide expressivity and speedups in specialized settings and the practical hurdles that must be overcome to realize scalable, real-world benefits.

Abstract

Quantum Transformers integrate the representational power of classical Transformers with the computational advantages of quantum computing. Since 2022, research in this area has rapidly expanded, giving rise to diverse technical paradigms and early applications. To address the growing need for consolidation, this paper presents the first comprehensive, systematic, and in-depth survey of quantum Transformer models. First, we delineate the research scope, focusing on improving Transformer parts with quantum methods, and introduce foundational concepts in classical Transformers and quantum machine learning. Then we organize existing studies into two main paradigms: PQC-based and QLA-based, with PQC-based paradigm further divided into QKV-only Quantum Mapping, Quantum Pairwise Attention, Quantum Holistic Attention. and Quantum-Assisted Optimization, analyzing their core mechanisms and architectural traits. We also summarize empirical results that demonstrate preliminary quantum advantages, especially on small-scale tasks or resource-constrained settings. Following this, we examine key technical challenges, such as complexity-resource trade-offs, scalability and generalization limitations, and trainability issues including barren plateaus, and provide potential solutions, including quantumizing classical transformer variants with lower complexity, hybrid designs, and improved optimization strategies. Finally, we propose several promising future directions, e.g., scaling quantum modules into large architectures, applying quantum Transformers to domains with inherently quantum data (e.g., physics, chemistry), and developing theory-driven designs grounded in quantum information science. This survey will help researchers and practitioners quickly grasp the overall landscape of current quantum Transformer research and promote future developments in this emerging field.

Paper Structure

This paper contains 27 sections, 24 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: The Strategic Roadmap of Quantum Transformers. The classical Transformer's $O(N^2)$ computational bottleneck has motivated the exploration of quantum solutions. These solutions have diverged into two main paradigms. The near-term, PQC-based path (blue stream) leverages NISQ-era hardware to achieve Quantum Enhanced Representation, leading to open questions about scalability to large-scale tasks and genuine algorithmic sub-quadratic speedup. The long-term, QLA-based path (red stream) targets fault-tolerant hardware with the goal of Theoretical Exponential Speedup, facing fundamental challenges in training and optimization. The figure maps the classification of existing research to this roadmap, highlighting current progress and future directions.
  • Figure 2: The application scenarios of Quantum Transformers
  • Figure 3: The structure of Transformer block. It processes the input through a self-attention mechanism followed by a feed-forward network. Each sub-layer is wrapped with an Add & Norm operation. Specifically, the input is first passed through the self-attention layer (with QKV mapping, attention score computation, and value weighting), then normalized and added to the original input. This output is further processed by a feed-forward layer, again followed by Add & Norm, producing the final feature maps.
  • Figure 4: (a) An Example PQC structure. (b) The framework of and VQA.
  • Figure 5: The quantum kernel circuit.
  • ...and 4 more figures