Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection
Donatella Genovese, Alessandro Sgroi, Alessio Devoto, Samuel Valentine, Lennox Wood, Cristiano Sebastiani, Stefano Giagu, Monica D'Onofrio, Simone Scardapane
TL;DR
The paper addresses the interpretability gap in graph-based analyses of collider data by introducing a Mixture-of-Experts Graph Transformer (MGT) that combines attention-based graph learning with expert specialization. The approach embeds intrinsic explainability via attention maps and gated expert routing, enabling tracing of predictions to physics-informed features. evaluated on SUSY-like Monte Carlo data modeled after ATLAS analyses, the MGT achieves competitive accuracy and superior interpretability, with attention patterns and expert activations aligning with known physics signatures such as b-jet correlations and missing energy. This work demonstrates a pathway to trustworthy AI-assisted discoveries in high-energy physics by coupling high predictive performance with mechanistic interpretability.
Abstract
The Large Hadron Collider at CERN produces immense volumes of complex data from high-energy particle collisions, demanding sophisticated analytical techniques for effective interpretation. Neural Networks, including Graph Neural Networks, have shown promise in tasks such as event classification and object identification by representing collisions as graphs. However, while Graph Neural Networks excel in predictive accuracy, their "black box" nature often limits their interpretability, making it difficult to trust their decision-making processes. In this paper, we propose a novel approach that combines a Graph Transformer model with Mixture-of-Expert layers to achieve high predictive performance while embedding interpretability into the architecture. By leveraging attention maps and expert specialization, the model offers insights into its internal decision-making, linking predictions to physics-informed features. We evaluate the model on simulated events from the ATLAS experiment, focusing on distinguishing rare Supersymmetric signal events from Standard Model background. Our results highlight that the model achieves competitive classification accuracy while providing interpretable outputs that align with known physics, demonstrating its potential as a robust and transparent tool for high-energy physics data analysis. This approach underscores the importance of explainability in machine learning methods applied to high energy physics, offering a path toward greater trust in AI-driven discoveries.
