Table of Contents
Fetching ...

Autoencoder-based Anomaly Detection System for Online Data Quality Monitoring of the CMS Electromagnetic Calorimeter

The CMS ECAL Collaboration

TL;DR

This work addresses the challenge of real-time data-quality monitoring for CMS ECAL by introducing a semi-supervised, autoencoder-based anomaly detection system that operates on occupancy images. The method leverages spatial response and time-evolution corrections to dramatically improve detection efficiency while keeping false alarms low, achieving near-100% anomaly capture at a low False Discovery Rate. It is trained on abundant good data with fake anomalies used for threshold calibration and validated on real anomalies from 2018 and 2022, demonstrating robust localization at the tower level. The system has been deployed in Run 3 within the CMS software framework (CMSSW) using ONNX Runtime, enabling real-time ML-quality plots that complement traditional DQM, detect degrading channels, and generalize to other detector subsystems or experiments.

Abstract

The CMS detector is a general-purpose apparatus that detects high-energy collisions produced at the LHC. Online Data Quality Monitoring of the CMS electromagnetic calorimeter is a vital operational tool that allows detector experts to quickly identify, localize, and diagnose a broad range of detector issues that could affect the quality of physics data. A real-time autoencoder-based anomaly detection system using semi-supervised machine learning is presented enabling the detection of anomalies in the CMS electromagnetic calorimeter data. A novel method is introduced which maximizes the anomaly detection performance by exploiting the time-dependent evolution of anomalies as well as spatial variations in the detector response. The autoencoder-based system is able to efficiently detect anomalies, while maintaining a very low false discovery rate. The performance of the system is validated with anomalies found in 2018 and 2022 LHC collision data. Additionally, the first results from deploying the autoencoder-based system in the CMS online Data Quality Monitoring workflow during the beginning of Run 3 of the LHC are presented, showing its ability to detect issues missed by the existing system.

Autoencoder-based Anomaly Detection System for Online Data Quality Monitoring of the CMS Electromagnetic Calorimeter

TL;DR

This work addresses the challenge of real-time data-quality monitoring for CMS ECAL by introducing a semi-supervised, autoencoder-based anomaly detection system that operates on occupancy images. The method leverages spatial response and time-evolution corrections to dramatically improve detection efficiency while keeping false alarms low, achieving near-100% anomaly capture at a low False Discovery Rate. It is trained on abundant good data with fake anomalies used for threshold calibration and validated on real anomalies from 2018 and 2022, demonstrating robust localization at the tower level. The system has been deployed in Run 3 within the CMS software framework (CMSSW) using ONNX Runtime, enabling real-time ML-quality plots that complement traditional DQM, detect degrading channels, and generalize to other detector subsystems or experiments.

Abstract

The CMS detector is a general-purpose apparatus that detects high-energy collisions produced at the LHC. Online Data Quality Monitoring of the CMS electromagnetic calorimeter is a vital operational tool that allows detector experts to quickly identify, localize, and diagnose a broad range of detector issues that could affect the quality of physics data. A real-time autoencoder-based anomaly detection system using semi-supervised machine learning is presented enabling the detection of anomalies in the CMS electromagnetic calorimeter data. A novel method is introduced which maximizes the anomaly detection performance by exploiting the time-dependent evolution of anomalies as well as spatial variations in the detector response. The autoencoder-based system is able to efficiently detect anomalies, while maintaining a very low false discovery rate. The performance of the system is validated with anomalies found in 2018 and 2022 LHC collision data. Additionally, the first results from deploying the autoencoder-based system in the CMS online Data Quality Monitoring workflow during the beginning of Run 3 of the LHC are presented, showing its ability to detect issues missed by the existing system.
Paper Structure (24 sections, 3 equations, 19 figures, 2 tables)

This paper contains 24 sections, 3 equations, 19 figures, 2 tables.

Figures (19)

  • Figure 1: Schematic view of the CMS detector and its various subdetectors.
  • Figure 2: Schematic view of the ECAL showing the cylindrical barrel closed by the two endcap regions with one half endcap displayed.
  • Figure 3: Example histograms from the ECAL DQM with (a) and (b) showing the distribution of RMS of the pedestal values in the barrel and EE$+$, respectively, drawn at a tower-level granularity. Diagrams (c) and (d) show the corresponding quality map for the two regions, drawn at a channel-level granularity, after a set of cuts is applied on the noise values shown in (a) and (b).
  • Figure 4: Channel status maps used in the ECAL DQM indicating the known problematic channels color coded for various types of errors for (a) EB, (b) EE$+$, and (c) EE$-$.
  • Figure 5: DQM quality plots with different anomalies shown in red, while the towers with known issues show up as dark brown or dark yellow. (a) EB$-$03 turned off due to a voltage failure and seen in red. (b) Anomaly in EE$+$04 (marked in red) originating from an electronics failure affecting 200 channels.
  • ...and 14 more figures