B-jet Tagging Using a Hybrid Edge Convolution and Transformer Architecture

Diego F. Vasquez Plaza; Vidya Manian

B-jet Tagging Using a Hybrid Edge Convolution and Transformer Architecture

Diego F. Vasquez Plaza, Vidya Manian

Abstract

Jet flavor tagging plays an important role in precise Standard Model measurement enabling the extraction of mass dependence in jet-quark interaction and quark-gluon plasma (QGP) interactions. They also enable inferring the nature of particles produced in high-energy particle collisions that contain heavy quarks. The classification of bottom jets is vital for exploring new Physics scenarios in proton-proton collisions. In this research, we present a hybrid deep learning architecture that integrates edge convolutions with transformer self-attention mechanisms, into one single architecture called the Edge Convolution Transformer (ECT) model for bottom-quark jet tagging. ECT processes track-level features (impact parameters, momentum, and their significances) alongside jet-level observables (vertex information and kinematics) to achieve state-of-the-art performance. The study utilizes the ATLAS simulation dataset. We demonstrate that ECT achieves 0.9333 AUC for b-jet versus combined charm and light jet discrimination, surpassing ParticleNet (0.8904 AUC) and the pure transformer baseline (0.9216 AUC). The model maintains inference latency below 0.060 ms per jet on modern GPUs, meeting the stringent requirements for real-time event selection at the LHC. Our results demonstrate that hybrid architectures combining local and global features offer superior performance for challenging jet classification tasks. The proposed architecture achieves good results in b-jet tagging, particularly excelling in charm jet rejection (the most challenging task), while maintaining competitive light-jet discrimination comparable to pure transformer models.

B-jet Tagging Using a Hybrid Edge Convolution and Transformer Architecture

Abstract

Paper Structure (21 sections, 1 equation, 7 figures, 4 tables, 1 algorithm)

This paper contains 21 sections, 1 equation, 7 figures, 4 tables, 1 algorithm.

Introduction
Flavor Tagging Literature Review
Bottom-Jet Tagging Methodology
Notation
Dataset and Preprocessing
ATLAS Simulation Dataset
Feature Engineering
Track-Level Features
Jet-Level Features
Normalization and Preprocessing
Classification Tasks
Data Distributions
Architectural Overview
Training and Optimization
Evaluation Metrics
...and 6 more sections

Figures (7)

Figure 1: Distributions of jet-level features for $b$-jets (red), $c$-jets (green), and light jets (blue) in the ATLAS simulation dataset. Top left: Jet pseudorapidity ($\eta^{\mathrm{jet}}$). Top right: Jet invariant mass ($M^{\mathrm{jet}}$). Middle left: Jet azimuthal angle ($\phi^{\mathrm{jet}}$). Middle right: Jet transverse momentum ($p_{\mathrm{T}}^{\mathrm{jet}}$). Bottom: Jet flavor distribution. Solid lines represent the training set (62.2% of data), while dashed lines show the validation set (18.9%). All distributions are normalized to unit area for comparison. The broader $p_{\mathrm{T}}$ distribution of $b$-jets reflects the higher mass of bottom quarks ($m_b \approx 4.2$ GeV/$c^2$), while vertex multiplicity differences are evident in the flavor distribution.
Figure 2: Distributions of track-level features for charged particles associated with $b$-jets (red), $c$-jets (green), and light jets (blue). Top row: Track transverse momentum ($p_{\mathrm{T}}^{\mathrm{trk}}$) showing $p_{\mathrm{T}}>1$ GeV/$c$ selection, and track electric charge distribution. Second row: Transverse ($d_0$) and longitudinal ($z_0$) impact parameters, demonstrating displaced vertices characteristic of heavy-flavor decays. The pronounced tails for $b$- and $c$-jets arise from secondary vertex displacements ($c\tau_b \approx 460~\mu$m, $c\tau_c \approx 150~\mu$m), while light jets peak sharply near zero, consistent with prompt tracks from the primary vertex. Bottom row: Track pseudorapidity ($\eta^{\mathrm{trk}}$) and azimuthal angle ($\phi^{\mathrm{trk}}$) showing angular distributions. where smaller uncertainties enable higher-significance discrimination of displaced vertices. Solid lines represent training data (7,624,594 tracks), dashed lines show validation data (2,312,445 tracks). All distributions are normalized to probability density. Heavy-flavor jets exhibit significantly broader impact parameter distributions, providing the primary discriminative power for $b$-jet identification
Figure 3: Architecture of the Edge Convolution Transformer (ECT) for $b$-jet tagging, showing complete information flow and tensor dimensions at each stage.
Figure 4: Training (Train) and validation (Val) metrics for the ECT model on the $b$ vs $c$+$light$ classification task over 100 epochs. Top left: Cross-entropy loss converges smoothly from 0.37 to 0.31 over the first 50 epochs, then stabilizes, indicating effective optimization without oscillations. Top right: Training (blue) and validation (orange) accuracy (acc) reach $\sim$87% with minimal gap (< 0.2%), demonstrating that the model generalizes well without overfitting. Bottom left: Area Under Curve (AUC) improves steadily from 0.92 to 0.93, plateauing around epoch 89 where early stopping was triggered (patience = 25). The small train-validation AUC gap confirms robust generalization. Bottom right: F1-score exhibits higher variance during early training due to threshold sensitivity, then stabilizes at 0.81, confirming balanced precision-recall trade-off. Dashed horizontal lines mark the maximum values achieved: training accuracy 87.2%, validation accuracy 87.0%, training AUC 0.930, validation AUC 0.928. These curves demonstrate stable convergence and effective regularization.
Figure 5: Misidentification rate versus signal efficiency for the ECT model across three $b$-jet classification tasks: $b$ vs. $c$ (orange), $b$ vs. $light$ (green), and $b$ vs. $c$+$light$ (blue). Horizontal dashed lines indicate standard working points used in ATLAS analyses: Loose (10% mistag rate), Medium (1%), and Tight (0.1%). The $b$ vs. $light$ discrimination achieves the best performance, reaching signal efficiencies above 65% even at the Tight working point, reflecting the distinct signatures of $light$ jets compared to heavy-flavor jets. The more challenging $b$ vs. $c$ task, where both jet types contain displaced vertices from heavy-hadron decays, shows reduced but still competitive performance. ECT achieves inference throughput of approximately 17,380 jets/s on a single NVIDIA RTX A5000 GPU, corresponding to less than 60 $\mu$s per jet, well within LHC High-Level Trigger latency requirements.
...and 2 more figures

B-jet Tagging Using a Hybrid Edge Convolution and Transformer Architecture

Abstract

B-jet Tagging Using a Hybrid Edge Convolution and Transformer Architecture

Authors

Abstract

Table of Contents

Figures (7)