Table of Contents
Fetching ...

Hits to Higgs: Reconstruction-Free Higgs Classification from Raw LHC Detector Data Using Higgsformers

Sascha Caron, Polina Moskvitina, Roberto Ruiz de Austri, Eugene Shalugin

TL;DR

This work investigates reconstruction-free Higgs classification by directly learning from raw LHC detector hits to distinguish tt̄H with H→bb from tt̄ backgrounds. It contrasts a hit-level Higgsformer, a lightweight set-based Transformer, with object-level baselines (MLP and ParT) trained on Delphes-reconstructed data, across varying dataset sizes and pileup conditions. The Higgsformer achieves a notable AUC of 0.792 at zero pileup and demonstrates robustness to pileup with meaningful performance advantages over simple hit-count baselines, while object-level models still outperform in this prototype. The results highlight the feasibility and benefits of end-to-end hit-level learning, offering substantial speedups and potential for reconstruction-free analyses in high-energy physics, with future work aiming to scale data and incorporate additional subdetectors.

Abstract

We present a comparative study of Higgs event classification at the Large Hadron Collider that bypasses the traditional reconstruction chain. As a benchmark, we focus on distinguishing $t\bar{t}H$ from $t\bar{t}$ events with $H \to b\bar{b}$, a particularly challenging task due to their similar final-state topologies. Our pipeline begins with event generation in Pythia8, fast simulation with ACTS/Fatras, and classification directly from raw detector hits. We show for the first time that a transformer model originally developed for inner tracker hit-to-track assignment can be retrained to classify Higgs events directly from raw hits. For comparison, we reconstruct the same events with \texttt{Delphes} and train object-based classifiers, including multilayer perceptrons and the Particle Transformer. We evaluate both approaches under varying dataset sizes and pileup levels. Although Higgsformer works exclusively with inner tracker hits (i.e., without calorimeter or muon information), it achieves strong performance with an AUC value of 0.792.

Hits to Higgs: Reconstruction-Free Higgs Classification from Raw LHC Detector Data Using Higgsformers

TL;DR

This work investigates reconstruction-free Higgs classification by directly learning from raw LHC detector hits to distinguish tt̄H with H→bb from tt̄ backgrounds. It contrasts a hit-level Higgsformer, a lightweight set-based Transformer, with object-level baselines (MLP and ParT) trained on Delphes-reconstructed data, across varying dataset sizes and pileup conditions. The Higgsformer achieves a notable AUC of 0.792 at zero pileup and demonstrates robustness to pileup with meaningful performance advantages over simple hit-count baselines, while object-level models still outperform in this prototype. The results highlight the feasibility and benefits of end-to-end hit-level learning, offering substantial speedups and potential for reconstruction-free analyses in high-energy physics, with future work aiming to scale data and incorporate additional subdetectors.

Abstract

We present a comparative study of Higgs event classification at the Large Hadron Collider that bypasses the traditional reconstruction chain. As a benchmark, we focus on distinguishing from events with , a particularly challenging task due to their similar final-state topologies. Our pipeline begins with event generation in Pythia8, fast simulation with ACTS/Fatras, and classification directly from raw detector hits. We show for the first time that a transformer model originally developed for inner tracker hit-to-track assignment can be retrained to classify Higgs events directly from raw hits. For comparison, we reconstruct the same events with \texttt{Delphes} and train object-based classifiers, including multilayer perceptrons and the Particle Transformer. We evaluate both approaches under varying dataset sizes and pileup levels. Although Higgsformer works exclusively with inner tracker hits (i.e., without calorimeter or muon information), it achieves strong performance with an AUC value of 0.792.

Paper Structure

This paper contains 16 sections, 2 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: ROC AUC curves on Test set for models trained on datasets with increasing size (10k, 20k, 40k).
  • Figure 2: Classification output (Logit) histograms for pileup levels 0, 5 and 20. Signal $t \overline{t} h$ is shown in red and background $t\overline{t}$ in blue.
  • Figure 3: ROC curves for a counts-only classifier using $n_{\text{hits}}$ at different pileup levels for all 40k events.
  • Figure 4: AUC as a function of training set size for different model architectures. Higgsformer-small shows continues improvement with scale. The horizontal dashed line indicates the baseline performance at $\mathrm{AUC} = 0.616$.
  • Figure 5: Top-$10$ important hits (3D) for Higgsformer-small trained with 10k (left) and 40k (right) training data.