Quantum Vision Transformers for Quark-Gluon Classification

Marçal Comajoan Cara; Gopal Ramesh Dahale; Zhongtian Dong; Roy T. Forestano; Sergei Gleyzer; Daniel Justice; Kyoungchul Kong; Tom Magorsch; Konstantin T. Matchev; Katia Matcheva; Eyup B. Unlu

Quantum Vision Transformers for Quark-Gluon Classification

Marçal Comajoan Cara, Gopal Ramesh Dahale, Zhongtian Dong, Roy T. Forestano, Sergei Gleyzer, Daniel Justice, Kyoungchul Kong, Tom Magorsch, Konstantin T. Matchev, Katia Matcheva, Eyup B. Unlu

TL;DR

The paper tackles quark–gluon jet classification under HL-LHC-scale data constraints by proposing a quantum-classical hybrid Vision Transformer (QViT) that embeds variational quantum circuits into both the attention and MLP components. The approach is evaluated on CMS Open Data jet images, showing that the QViT achieves nearly parity with a classical Vision Transformer having a similar parameter count, albeit with a small ~2% AUC gap likely due to optimization and expressivity limitations of simulated VQCs. Key contributions include a concrete QViT design with four 4-qubit VQCs replacing linear projections in MHA and MLP, and a demonstration that quantum-inspired components can match classical performance on a realistic HEP task. The work provides a practical path toward quantum-assisted ML for high-energy physics, with plans to test on real quantum hardware, explore data augmentation and data re-uploading, and extend hyperparameter search to seek potential quantum advantages.

Abstract

We introduce a hybrid quantum-classical vision transformer architecture, notable for its integration of variational quantum circuits within both the attention mechanism and the multi-layer perceptrons. The research addresses the critical challenge of computational efficiency and resource constraints in analyzing data from the upcoming High Luminosity Large Hadron Collider, presenting the architecture as a potential solution. In particular, we evaluate our method by applying the model to multi-detector jet images from CMS Open Data. The goal is to distinguish quark-initiated from gluon-initiated jets. We successfully train the quantum model and evaluate it via numerical simulations. Using this approach, we achieve classification performance almost on par with the one obtained with the completely classical architecture, considering a similar number of parameters.

Quantum Vision Transformers for Quark-Gluon Classification

TL;DR

Abstract

Paper Structure (10 sections, 16 equations, 8 figures)

This paper contains 10 sections, 16 equations, 8 figures.

Introduction
Background
(Classical) Deep Learning, the Transformer, and the Vision Transformer
Quantum Computing and Quantum Machine Learning
High-Energy Physics and Jets
Method
Data
Model
Results
Conclusions

Figures (8)

Figure S1: The CMS coordinates the system against the backdrop of the LHC, with the location of the four main experiments (CMS, ALICE, ATLAS, and LHCb). The $z$ axis points to the Jura mountains, while the $y$-axis points toward the sky. In spherical coordinates, the components of a particle momentum $\vec{p}$ are its magnitude $|\vec{p}|$, the polar angle $\theta$ (measured from the $z$-axis), and the azimuthal angle $\varphi$ (measured from the $x$-axis). The transverse momentum $\vec{p}_T$ is the projection of $\vec{p}$ on the transverse ($xy$) plane. This figure was generated with TikZ code adapted from Ref. CMS_Coordinate_System.
Figure S2: Representative images of jets for both quarks (top) and gluons (bottom). The columns show the distinct sub-detectors: Tracks, ECAL, HCAL, and a composite image combining all three. All images are in log scale. Note that the ECAL and HCAL were upscaled to match the Tracks resolution.
Figure S3: Average images of quarks (top) and gluons (bottom) across the entire dataset. The columns show the distinct sub-detectors: Tracks, ECAL, HCAL, and a composite image combining all three. All images are in log scale. Note the more dispersed nature of the gluon jets across channels.
Figure S4: Model overview. QMHA stands for quantum multi-head attention and QMLP for quantum multi-layer perceptron. The drawing style of the illustration wasThe illustration was inspired by vit, the major difference being that here we use a quantum transformer encoder as depicted in the right side of the figure.
Figure S5: Variational quantum circuits used in the proposed QViT.
...and 3 more figures

Quantum Vision Transformers for Quark-Gluon Classification

TL;DR

Abstract

Quantum Vision Transformers for Quark-Gluon Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (8)