A Survey of Quantum Transformers: Architectures, Challenges and Outlooks
Hui Zhang, Qinglin Zhao, Mengchu Zhou, Li Feng, Dusit Niyato, Shenggen Zheng, Lin Chen
TL;DR
This paper surveys the emerging field of quantum Transformers, addressing how to fuse classical Transformer architectures with quantum computing. It distinguishes two main implementation paradigms—PQC-based methods suitable for NISQ devices and QLA-based approaches geared toward fault-tolerant quantum computing—and provides a fine-grained taxonomy of PQC-based subtypes (QKV-only mapping, quantum pairwise and holistic attention, and quantum-assisted optimization). The review consolidates architectural traits, empirical findings on small-scale quantum advantages, and a candid analysis of challenges such as complexity trade-offs, scalability, and trainability, offering proposed solutions and future directions. The work highlights both the potential of quantum Transformers to provide expressivity and speedups in specialized settings and the practical hurdles that must be overcome to realize scalable, real-world benefits.
Abstract
Quantum Transformers integrate the representational power of classical Transformers with the computational advantages of quantum computing. Since 2022, research in this area has rapidly expanded, giving rise to diverse technical paradigms and early applications. To address the growing need for consolidation, this paper presents the first comprehensive, systematic, and in-depth survey of quantum Transformer models. First, we delineate the research scope, focusing on improving Transformer parts with quantum methods, and introduce foundational concepts in classical Transformers and quantum machine learning. Then we organize existing studies into two main paradigms: PQC-based and QLA-based, with PQC-based paradigm further divided into QKV-only Quantum Mapping, Quantum Pairwise Attention, Quantum Holistic Attention. and Quantum-Assisted Optimization, analyzing their core mechanisms and architectural traits. We also summarize empirical results that demonstrate preliminary quantum advantages, especially on small-scale tasks or resource-constrained settings. Following this, we examine key technical challenges, such as complexity-resource trade-offs, scalability and generalization limitations, and trainability issues including barren plateaus, and provide potential solutions, including quantumizing classical transformer variants with lower complexity, hybrid designs, and improved optimization strategies. Finally, we propose several promising future directions, e.g., scaling quantum modules into large architectures, applying quantum Transformers to domains with inherently quantum data (e.g., physics, chemistry), and developing theory-driven designs grounded in quantum information science. This survey will help researchers and practitioners quickly grasp the overall landscape of current quantum Transformer research and promote future developments in this emerging field.
