Table of Contents
Fetching ...

Synthetic Data Generation with Lorenzetti for Time Series Anomaly Detection in High-Energy Physics Calorimeters

Laura Boggia, Bogdan Malaescu

TL;DR

Problem: robust anomaly detection in multivariate calorimeter time series with scarce labels. Approach: generate synthetic detector defects using the Lorenzetti Simulator to create datasets and evaluate multiple TSAD models (unsupervised baseline, iTransformer, TranAD, USAD) under varying pileup. Findings: inactive-module defects are readily detected; increased-noise defects are challenging due to simulator limitations, highlighting the need for richer noise models. Impact: provides a reusable framework for testing and benchmarking TSAD in high-energy physics calorimeters, guiding improvements in detector data quality monitoring for future LHC upgrades.

Abstract

Anomaly detection in multivariate time series is crucial to ensure the quality of data coming from a physics experiment. Accurately identifying the moments when unexpected errors or defects occur is essential, yet challenging due to scarce labels, unknown anomaly types, and complex correlations across dimensions. To address the scarcity and unreliability of labelled data, we use the Lorenzetti Simulator to generate synthetic events with injected calorimeter anomalies. We then assess the sensitivity of several time series anomaly detection methods, including transformer-based and other deep learning models. The approach employed here is generic and applicable to different detector designs and defects.

Synthetic Data Generation with Lorenzetti for Time Series Anomaly Detection in High-Energy Physics Calorimeters

TL;DR

Problem: robust anomaly detection in multivariate calorimeter time series with scarce labels. Approach: generate synthetic detector defects using the Lorenzetti Simulator to create datasets and evaluate multiple TSAD models (unsupervised baseline, iTransformer, TranAD, USAD) under varying pileup. Findings: inactive-module defects are readily detected; increased-noise defects are challenging due to simulator limitations, highlighting the need for richer noise models. Impact: provides a reusable framework for testing and benchmarking TSAD in high-energy physics calorimeters, guiding improvements in detector data quality monitoring for future LHC upgrades.

Abstract

Anomaly detection in multivariate time series is crucial to ensure the quality of data coming from a physics experiment. Accurately identifying the moments when unexpected errors or defects occur is essential, yet challenging due to scarce labels, unknown anomaly types, and complex correlations across dimensions. To address the scarcity and unreliability of labelled data, we use the Lorenzetti Simulator to generate synthetic events with injected calorimeter anomalies. We then assess the sensitivity of several time series anomaly detection methods, including transformer-based and other deep learning models. The approach employed here is generic and applicable to different detector designs and defects.

Paper Structure

This paper contains 10 sections, 2 figures.

Figures (2)

  • Figure 1: Phase $(\eta, \phi)$ space shows cell energies after digitisation, aggregated over $10$ events. The left plot depicts the default simulation, while the right highlights anomalies from inactive modules (missing deposits in cyan circles).
  • Figure 2: MCC scores (averaged over five random seeds) as a function of the anomaly rate in the test dataset without pileup (left) and with pileup (right) for three anomaly label methods.