Table of Contents
Fetching ...

MEDIC: a network for monitoring data quality in collider experiments

Juvenal Bassa, Arghya Chattopadhyay, Sudhir Malik, Mario Escabi Rivera

TL;DR

This work addresses the challenge of data quality monitoring in high-energy collider experiments by proposing MEDIC, a simulation-driven, end-to-end neural network that detects and localizes detector glitches directly from particle-level outputs. Leveraging a modified Delphes simulator to create labelled glitch scenarios, MEDIC employs permutation-invariant embeddings for tracks and towers, a dedicated MET pathway, and a 2D convolutional classifier to process a sliding window of events, optimized with a KL-divergence loss and ensemble evaluation. The results show strong multi-class and binary anomaly detection performance, with the best results at a window size of $W=30$, highlighting the potential for fast, online DQM that does not rely on manually certified reference data. This approach offers a flexible, reproducible foundation for ML-based monitoring that can adapt to HL-LHC upgrades and future detector architectures, while acknowledging the need for more detailed simulations and integration with real detector outputs.

Abstract

Data Quality Monitoring (DQM) is a crucial component of particle physics experiments and ensures that the recorded data is of the highest quality, and suitable for subsequent physics analysis. Due to the extreme environmental conditions, unprecedented data volumes, and the sheer scale and complexity of the detectors, DQM orchestration has become a very challenging task. Therefore, the use of Machine Learning (ML) to automate anomaly detection, improve efficiency, and reduce human error in the process of collecting high-quality data is unavoidable. Since DQM relies on real experimental data, it is inherently tied to the specific detector substructure and technology in operation. In this work, a simulation-driven approach to DQM is proposed, enabling the study and development of data-quality methodologies in a controlled environment. Using a modified version of Delphes -- a fast, multi-purpose detector simulation -- the preliminary realization of a framework is demonstrated which leverages ML to identify detector anomalies as well as localize the malfunctioning components responsible. We introduce MEDIC (Monitoring for Event Data Integrity and Consistency), a neural network designed to learn detector behavior and perform DQM tasks to look for potential faults. Although the present implementation adopts a simplified setup for computational ease, where large detector regions are deliberately deactivated to mimic faults, this work represents an initial step toward a comprehensive ML-based DQM framework. The encouraging results underline the potential of simulation-driven studies as a foundation for developing more advanced, data-driven DQM systems for future particle detectors.

MEDIC: a network for monitoring data quality in collider experiments

TL;DR

This work addresses the challenge of data quality monitoring in high-energy collider experiments by proposing MEDIC, a simulation-driven, end-to-end neural network that detects and localizes detector glitches directly from particle-level outputs. Leveraging a modified Delphes simulator to create labelled glitch scenarios, MEDIC employs permutation-invariant embeddings for tracks and towers, a dedicated MET pathway, and a 2D convolutional classifier to process a sliding window of events, optimized with a KL-divergence loss and ensemble evaluation. The results show strong multi-class and binary anomaly detection performance, with the best results at a window size of , highlighting the potential for fast, online DQM that does not rely on manually certified reference data. This approach offers a flexible, reproducible foundation for ML-based monitoring that can adapt to HL-LHC upgrades and future detector architectures, while acknowledging the need for more detailed simulations and integration with real detector outputs.

Abstract

Data Quality Monitoring (DQM) is a crucial component of particle physics experiments and ensures that the recorded data is of the highest quality, and suitable for subsequent physics analysis. Due to the extreme environmental conditions, unprecedented data volumes, and the sheer scale and complexity of the detectors, DQM orchestration has become a very challenging task. Therefore, the use of Machine Learning (ML) to automate anomaly detection, improve efficiency, and reduce human error in the process of collecting high-quality data is unavoidable. Since DQM relies on real experimental data, it is inherently tied to the specific detector substructure and technology in operation. In this work, a simulation-driven approach to DQM is proposed, enabling the study and development of data-quality methodologies in a controlled environment. Using a modified version of Delphes -- a fast, multi-purpose detector simulation -- the preliminary realization of a framework is demonstrated which leverages ML to identify detector anomalies as well as localize the malfunctioning components responsible. We introduce MEDIC (Monitoring for Event Data Integrity and Consistency), a neural network designed to learn detector behavior and perform DQM tasks to look for potential faults. Although the present implementation adopts a simplified setup for computational ease, where large detector regions are deliberately deactivated to mimic faults, this work represents an initial step toward a comprehensive ML-based DQM framework. The encouraging results underline the potential of simulation-driven studies as a foundation for developing more advanced, data-driven DQM systems for future particle detectors.

Paper Structure

This paper contains 10 sections, 7 equations, 5 figures, 4 tables, 2 algorithms.

Figures (5)

  • Figure 1: Tower eta distribution for different scenarios.
  • Figure 2: Tower eta distribution for each of the $4$ detector simulation.
  • Figure 3: Schematic representation of MEDIC neural network architecture. The model treats the three branches (Tracks, Towers, and MET) in three separate channels that encode detector inputs through linear projections, transformer encoders, and attention pooling. This is followed by a series of convolutional layers, then a global average pooling and a fully connected classifier to return probabilities.
  • Figure 4: Training metrics for $\mathcal{W}=30$ across all epochs and $5$ folds.
  • Figure 5: Evaluation of MEDIC on test data for $\mathcal{W}=30$ with top row being multi-class performance and bottom row demonstrating binary performance.