Synthetic Data Generation with Lorenzetti for Time Series Anomaly Detection in High-Energy Physics Calorimeters
Laura Boggia, Bogdan Malaescu
TL;DR
Problem: robust anomaly detection in multivariate calorimeter time series with scarce labels. Approach: generate synthetic detector defects using the Lorenzetti Simulator to create datasets and evaluate multiple TSAD models (unsupervised baseline, iTransformer, TranAD, USAD) under varying pileup. Findings: inactive-module defects are readily detected; increased-noise defects are challenging due to simulator limitations, highlighting the need for richer noise models. Impact: provides a reusable framework for testing and benchmarking TSAD in high-energy physics calorimeters, guiding improvements in detector data quality monitoring for future LHC upgrades.
Abstract
Anomaly detection in multivariate time series is crucial to ensure the quality of data coming from a physics experiment. Accurately identifying the moments when unexpected errors or defects occur is essential, yet challenging due to scarce labels, unknown anomaly types, and complex correlations across dimensions. To address the scarcity and unreliability of labelled data, we use the Lorenzetti Simulator to generate synthetic events with injected calorimeter anomalies. We then assess the sensitivity of several time series anomaly detection methods, including transformer-based and other deep learning models. The approach employed here is generic and applicable to different detector designs and defects.
