Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection

Donatella Genovese; Alessandro Sgroi; Alessio Devoto; Samuel Valentine; Lennox Wood; Cristiano Sebastiani; Stefano Giagu; Monica D'Onofrio; Simone Scardapane

Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection

Donatella Genovese, Alessandro Sgroi, Alessio Devoto, Samuel Valentine, Lennox Wood, Cristiano Sebastiani, Stefano Giagu, Monica D'Onofrio, Simone Scardapane

TL;DR

The paper addresses the interpretability gap in graph-based analyses of collider data by introducing a Mixture-of-Experts Graph Transformer (MGT) that combines attention-based graph learning with expert specialization. The approach embeds intrinsic explainability via attention maps and gated expert routing, enabling tracing of predictions to physics-informed features. evaluated on SUSY-like Monte Carlo data modeled after ATLAS analyses, the MGT achieves competitive accuracy and superior interpretability, with attention patterns and expert activations aligning with known physics signatures such as b-jet correlations and missing energy. This work demonstrates a pathway to trustworthy AI-assisted discoveries in high-energy physics by coupling high predictive performance with mechanistic interpretability.

Abstract

The Large Hadron Collider at CERN produces immense volumes of complex data from high-energy particle collisions, demanding sophisticated analytical techniques for effective interpretation. Neural Networks, including Graph Neural Networks, have shown promise in tasks such as event classification and object identification by representing collisions as graphs. However, while Graph Neural Networks excel in predictive accuracy, their "black box" nature often limits their interpretability, making it difficult to trust their decision-making processes. In this paper, we propose a novel approach that combines a Graph Transformer model with Mixture-of-Expert layers to achieve high predictive performance while embedding interpretability into the architecture. By leveraging attention maps and expert specialization, the model offers insights into its internal decision-making, linking predictions to physics-informed features. We evaluate the model on simulated events from the ATLAS experiment, focusing on distinguishing rare Supersymmetric signal events from Standard Model background. Our results highlight that the model achieves competitive classification accuracy while providing interpretable outputs that align with known physics, demonstrating its potential as a robust and transparent tool for high-energy physics data analysis. This approach underscores the importance of explainability in machine learning methods applied to high energy physics, offering a path toward greater trust in AI-driven discoveries.

Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection

TL;DR

Abstract

Paper Structure (12 sections, 12 equations, 11 figures, 5 tables)

This paper contains 12 sections, 12 equations, 11 figures, 5 tables.

Related Works
Graph Neural Networks and Graph Transformers
Mixture of Experts
Explainability for High Energy Physics
Experimental setup
Dataset
Mixture of Experts Graph Transformer
Results
Discussion
Analysis of the Attention Heads
Analysis of Experts Specialization
Conclusion

Figures (11)

Figure 1: Overview of the proposed Transformer-based model architecture: the image illustrates an example of the model, which takes as input a graph. The graph is processed by the model, comprising various blocks: Att, which is the Multi-Head Attention block, and MoE blocks. The visualization includes attention maps derived from the Multi-Head Attention mechanism and the activation patterns of the experts for a single collision event example.
Figure 2: (a) Diagram of the SUSY signal process, showing chargino and neutralino decaying into W and Higgs bosons, with leptons, neutrinos, and b-quarks in the final state. (b) Schematic representation of the particle collision event modeled as a fully connected graph, highlighting the reconstructed particles: j1, j2, j3,b1,b2,l,E.
Figure 3: Distributions of (a) $p_T$ of lepton and (c) for the $E^{Miss}_{T}$, comparing signal and the two main background processes.
Figure 4: Detailed illustration of the proposed architecture incorporating multi-head attention with an MoE block. The model begins with Laplacian positional encoding for the input features, followed by multi-head attention and normalization. This structure is repeated across L layers. The MoE block, driven by a gating network, assigns inputs dynamically to specialized expert networks. A classification head processes the final representation to produce predictions.
Figure 5: Attention maps for test set.
...and 6 more figures

Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection

TL;DR

Abstract

Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (11)