Table of Contents
Fetching ...

Hybrid Quantum Vision Transformers for Event Classification in High Energy Physics

Eyup B. Unlu, Marçal Comajoan Cara, Gopal Ramesh Dahale, Zhongtian Dong, Roy T. Forestano, Sergei Gleyzer, Daniel Justice, Kyoungchul Kong, Tom Magorsch, Konstantin T. Matchev, Katia Matcheva

TL;DR

This work constructs several variations of a quantum hybrid vision transformer for a classification problem in high-energy physics (distinguishing photons and electrons in the electromagnetic calorimeter) and test them against classical vision transformer architectures, indicating that the hybrids can achieve comparable performance to their classical analogs with a similar number of parameters.

Abstract

Models based on vision transformer architectures are considered state-of-the-art when it comes to image classification tasks. However, they require extensive computational resources both for training and deployment. The problem is exacerbated as the amount and complexity of the data increases. Quantum-based vision transformer models could potentially alleviate this issue by reducing the training and operating time while maintaining the same predictive power. Although current quantum computers are not yet able to perform high-dimensional tasks yet, they do offer one of the most efficient solutions for the future. In this work, we construct several variations of a quantum hybrid vision transformer for a classification problem in high energy physics (distinguishing photons and electrons in the electromagnetic calorimeter). We test them against classical vision transformer architectures. Our findings indicate that the hybrid models can achieve comparable performance to their classical analogues with a similar number of parameters.

Hybrid Quantum Vision Transformers for Event Classification in High Energy Physics

TL;DR

This work constructs several variations of a quantum hybrid vision transformer for a classification problem in high-energy physics (distinguishing photons and electrons in the electromagnetic calorimeter) and test them against classical vision transformer architectures, indicating that the hybrids can achieve comparable performance to their classical analogs with a similar number of parameters.

Abstract

Models based on vision transformer architectures are considered state-of-the-art when it comes to image classification tasks. However, they require extensive computational resources both for training and deployment. The problem is exacerbated as the amount and complexity of the data increases. Quantum-based vision transformer models could potentially alleviate this issue by reducing the training and operating time while maintaining the same predictive power. Although current quantum computers are not yet able to perform high-dimensional tasks yet, they do offer one of the most efficient solutions for the future. In this work, we construct several variations of a quantum hybrid vision transformer for a classification problem in high energy physics (distinguishing photons and electrons in the electromagnetic calorimeter). We test them against classical vision transformer architectures. Our findings indicate that the hybrid models can achieve comparable performance to their classical analogues with a similar number of parameters.
Paper Structure (11 sections, 2 equations, 9 figures, 1 table)

This paper contains 11 sections, 2 equations, 9 figures, 1 table.

Figures (9)

  • Figure S1: The CMS coordinate system against the backdrop of the LHC, with the location of the four main experiments (CMS, ALICE, ATLAS and LHCb). The $z$ axis points to the Jura mountains, while the $y$-axis points toward the sky. In spherical coordinates, the components of a particle momentum $\vec{p}$ are its magnitude $|\vec{p}|$, the polar angle $\theta$ (measured from the $z$-axis), and the azimuthal angle $\varphi$ (measured from the $x$-axis). The transverse momentum $\vec{p}_T$ is the projection of $\vec{p}$ on the transverse ($xy$) plane. Figure generated with TikZ code adapted from Ref. CMS_Coordinate_System.
  • Figure S2: Four representative image grid examples from the dataset, in the $(\varphi,\eta)$ plane. The first row shows the image grids for the energy (normalized and displayed in log$_{10}$ scale), while the second row displays the timing information. The titles list the correct labels (real electron or real photon), as well as the corresponding labels predicted by one of the benchmark classical models (see text for more details).
  • Figure S3: The architecture for the (a) column-wise pooling and (b) the class-token models. For clarity, we use an MNIST image MNIST_DATA to demonstrate the process. The hybrid and the classical model differ by the architecture of their encoder layers (see Figures \ref{['fig:encoder_classic']} and \ref{['fig:encoder_quantum']}).
  • Figure S4: The classical encoder layer (a) and multi-head attention (b) architecture for the benchmark models.
  • Figure S5: The hybrid encoder layer architecture (a) and multi-head attention (b) architecture for the hybrid models.
  • ...and 4 more figures