Data Quality Monitoring for the Hadron Calorimeters Using Transfer Learning for Anomaly Detection

Mulugeta Weldezgina Asres; Christian Walter Omlin; Long Wang; Pavel Parygin; David Yu; Jay Dittmann; The CMS-HCAL Collaboration

Data Quality Monitoring for the Hadron Calorimeters Using Transfer Learning for Anomaly Detection

Mulugeta Weldezgina Asres, Christian Walter Omlin, Long Wang, Pavel Parygin, David Yu, Jay Dittmann, The CMS-HCAL Collaboration

TL;DR

This work tackles the challenge of data-scarce, high-dimensional spatio-temporal anomaly detection in the CMS Hadron Calorimeter by applying transfer learning to a GraphSTAD hybrid autoencoder (CNN+GNN+RNN). By transferring from the HCAL Endcap (HE) to the Barrel (HB) and exploring multiple TL configurations on both the encoder and decoder, the study demonstrates substantial reductions in trainable parameters and improved robustness to training data contamination while preserving ST reconstruction and anomaly detection. The results show that careful TL of spatial encoders and temporal decoders, along with state-preserving RNNs and learning-rate scheduling, yields notable gains in reconstruction accuracy and AD performance under limited target data. These findings offer practical guidance for deploying TL-enabled ST anomaly detection in large-scale detector monitoring and provide insights transferable to other ST data domains with similar structure and constraints.

Abstract

The proliferation of sensors brings an immense volume of spatio-temporal (ST) data in many domains, including monitoring, diagnostics, and prognostics applications. Data curation is a time-consuming process for a large volume of data, making it challenging and expensive to deploy data analytics platforms in new environments. Transfer learning (TL) mechanisms promise to mitigate data sparsity and model complexity by utilizing pre-trained models for a new task. Despite the triumph of TL in fields like computer vision and natural language processing, efforts on complex ST models for anomaly detection (AD) applications are limited. In this study, we present the potential of TL within the context of high-dimensional ST AD with a hybrid autoencoder architecture, incorporating convolutional, graph, and recurrent neural networks. Motivated by the need for improved model accuracy and robustness, particularly in scenarios with limited training data on systems with thousands of sensors, this research investigates the transferability of models trained on different sections of the Hadron Calorimeter of the Compact Muon Solenoid experiment at CERN. The key contributions of the study include exploring TL's potential and limitations within the context of encoder and decoder networks, revealing insights into model initialization and training configurations that enhance performance while substantially reducing trainable parameters and mitigating data contamination effects. Code: https://github.com/muleina/CMS\_HCAL\_ML\_OnlineDQM .

Data Quality Monitoring for the Hadron Calorimeters Using Transfer Learning for Anomaly Detection

TL;DR

Abstract

Paper Structure (19 sections, 12 equations, 14 figures, 7 tables)

This paper contains 19 sections, 12 equations, 14 figures, 7 tables.

Introduction
Background
Transfer Learning on Deep Learning
The Hadron Calorimeter of the CMS Detector
CMS Data Quality Monitoring
Dataset Description
Methodology
Data Preprocessing
Digi-Occupancy Map Renormalization
Adjacency Matrix Generation
Anomaly Detection Mechanism
Transfer Learning Approach
Results and Discussion
Spatio-Temporal Reconstruction Performance
Transfer Learning on Spatial Learning Networks
...and 4 more sections

Figures (14)

Figure S1: Schematic of the CMS detector: (a) CMS with its major systems focardi2012status, and (b) geometry axes and angles of the CMS with respect to the collision intersection point tikz2023.
Figure S2: The subdetectors of the HCAL: (a) longitudinal view of the HB, HE, HF, and HO subdetectors on CMS cheung2012cms; and (b) longitudinal view of one quadrant of CMS with segmentation angle specifications of the $\eta$, where the origin denotes the interaction point collaboration1999cmscollaboration2008cms.
Figure S3: A sample digi-occupancy map (year = 2018, RunId = 325,170, LS = 15): (a) digi-occupancy map for the HE and HB together; (b) the source system HE channels are placed in $\mathopen{}\mathclose{\left| i\eta \right| \in [16, \dots, 29]$, $i\phi \in [1, \dots, 72]$, and depth$\in [1, \dots, 7]$; and (c) the target system HB channels are placed in $\mathopen{}\mathclose{\left| i\eta \right| \in [1, \dots, 16]$, $i\phi \in [1, \dots, 72]$, and depth$\in [1, 2]$. The HE and HB share similarities and differences in tasks, calorimeter technology, and data characteristics. The missing sector at (b) corresponds to the two failed HE-RBX sectors during the 2018 collision runs.
Figure S4: Total digi-occupancy data distribution of the HB and run settings per map ($s$): the received luminosity ($\beta_s$) and the number of events ($\xi_s$). $N_1$ is the renormalization of $\gamma_s$ based on $\xi_s$, and $N_2$ is the reversible renormalization based on the median $\gamma$ along the $i\phi$ axis. The colors correspond to different collision runs.
Figure S5: The architecture of the proposed AE for the GraphSTAD system mulugeta2022dqm. The GNN and CNN provide spatial feature extraction for each time step, and the RNN network captures the temporal behavior of the extracted features. The feature extraction $\mathcal{E}_\theta$ incorporates the GNN for back-end physical connectivity among the spatial channels, CNN for regional spatial proximity of the channels, and RNN for temporal behavior extraction. $\mathcal{D}_\omega$ contains RNNs and deconvolutional neural networks to reconstruct the ST input data from the low-dimensional latent features.
...and 9 more figures

Data Quality Monitoring for the Hadron Calorimeters Using Transfer Learning for Anomaly Detection

TL;DR

Abstract

Data Quality Monitoring for the Hadron Calorimeters Using Transfer Learning for Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (14)