Table of Contents
Fetching ...

Compact Representation of Particle-Collision Events for Physics-Informed Machine Learning

Wasikul Islam, Sergei Chekanov

TL;DR

This work tackles the challenge of high-dimensional collider event representations by introducing RMM-C46, a 46-zone, physics-driven compression of the rapidity–mass matrix (RMM). By aggregating the RMM into physically interpretable blocks and employing additive or Frobenius-norm zone aggregations, RMM-C46 preserves key kinematic structures while drastically reducing dimensionality, enabling efficient classical ML and facilitating near-term quantum hardware deployment. Empirical results on 13.6 TeV proton–proton MC samples show that RMM-C46 matches or slightly exceeds the full RMM in supervised tasks and significantly improves unsupervised anomaly detection performance, with Frobenius-based aggregation often yielding the best results. The approach offers a practical, quantum-ready representation for HL-LHC-era analyses, providing improved training efficiency, interpretability, and scalable integration with quantum–classical hybrid pipelines; code is publicly available in the GIT repository c46git.

Abstract

We introduce a compact, physics-driven event representation, RMM-C46, designed to compress the high-dimensional rapidity mass matrix (RMM) into a low-dimensional, interpretable feature set suitable for physics-informed machine learning (ML) and quantum computing applications. The full RMM encodes detailed pairwise correlations among jets, b-jets, leptons, photons, and missing transverse energy but contains more than a thousand values per event, making it computationally heavy for large-scale training and incompatible with current low-qubit quantum devices. The proposed RMM-C46 input space for ML preserves the physical block structure of the RMM through aggregated invariant mass, rapidity difference, and transverse energy components, reducing the size of the original RMM by over an order of magnitude while maintaining interpretability. Applied to simulated proton-proton collisions at centre-of-mass energy of 13.6 TeV, these representations match or exceed the discriminative performance of the full RMM in both supervised and unsupervised ML tasks. Their compactness, stability, and physics transparency also make them naturally compatible with near-term quantum machine learning architectures. RMM-C46 provides a scalable, efficient, and quantum-ready alternative to the full RMM for next-generation collider physics analyses.

Compact Representation of Particle-Collision Events for Physics-Informed Machine Learning

TL;DR

This work tackles the challenge of high-dimensional collider event representations by introducing RMM-C46, a 46-zone, physics-driven compression of the rapidity–mass matrix (RMM). By aggregating the RMM into physically interpretable blocks and employing additive or Frobenius-norm zone aggregations, RMM-C46 preserves key kinematic structures while drastically reducing dimensionality, enabling efficient classical ML and facilitating near-term quantum hardware deployment. Empirical results on 13.6 TeV proton–proton MC samples show that RMM-C46 matches or slightly exceeds the full RMM in supervised tasks and significantly improves unsupervised anomaly detection performance, with Frobenius-based aggregation often yielding the best results. The approach offers a practical, quantum-ready representation for HL-LHC-era analyses, providing improved training efficiency, interpretability, and scalable integration with quantum–classical hybrid pipelines; code is publicly available in the GIT repository c46git.

Abstract

We introduce a compact, physics-driven event representation, RMM-C46, designed to compress the high-dimensional rapidity mass matrix (RMM) into a low-dimensional, interpretable feature set suitable for physics-informed machine learning (ML) and quantum computing applications. The full RMM encodes detailed pairwise correlations among jets, b-jets, leptons, photons, and missing transverse energy but contains more than a thousand values per event, making it computationally heavy for large-scale training and incompatible with current low-qubit quantum devices. The proposed RMM-C46 input space for ML preserves the physical block structure of the RMM through aggregated invariant mass, rapidity difference, and transverse energy components, reducing the size of the original RMM by over an order of magnitude while maintaining interpretability. Applied to simulated proton-proton collisions at centre-of-mass energy of 13.6 TeV, these representations match or exceed the discriminative performance of the full RMM in both supervised and unsupervised ML tasks. Their compactness, stability, and physics transparency also make them naturally compatible with near-term quantum machine learning architectures. RMM-C46 provides a scalable, efficient, and quantum-ready alternative to the full RMM for next-generation collider physics analyses.
Paper Structure (18 sections, 13 equations, 6 figures)

This paper contains 18 sections, 13 equations, 6 figures.

Figures (6)

  • Figure 1: Example of a RMM matrix, displaying 46 zones, which result into 46 variables.
  • Figure 2: Comparison of the mean RMM-C46 feature values for four datasets: Standard Model $t\bar{t}$, WZJets, $X\!\rightarrow\!HH$ (1500 GeV), and $X\!\rightarrow\!SH$ (1500 GeV). The Frobenius aggregation heatmap (C46-frob) shows the 46-zone partition of the RMM, and highlights different aspects of the underlying block magnitudes.
  • Figure 3: Comparison of the loss distributions obtained using the RMM and RMM-C46 inputs. The autoencoders have identical architectures. The loss distributions are shown for different MC processes.
  • Figure 4: Comparison of AUC curves for the signal Monte Carlo vs $t\bar{t}$ background, after applying the selection cut on the reconstruction losses. They are computed using identical autoencoder architectures, with inputs provided either as RMM or RMM-C46.
  • Figure 5: Comparison of AUC curves for the signal MC vs $t\bar{t}$ background, after applying the selection cut on the reconstruction losses. They are computed using identical variational autoencoder (VAE) with inputs provided either as RMM or RMM-C46.
  • ...and 1 more figures