Triggering Dark Showers with Conditional Dual Auto-Encoders

Luca Anzalone; Simranjit Singh Chhibra; Benedikt Maier; Nadezda Chernyavskaya; Maurizio Pierini

Triggering Dark Showers with Conditional Dual Auto-Encoders

Luca Anzalone, Simranjit Singh Chhibra, Benedikt Maier, Nadezda Chernyavskaya, Maurizio Pierini

TL;DR

The paper tackles model-independent searches for new physics in collider data by reframing signal detection as anomaly detection on raw detector images. It introduces Conditional Dual Auto-Encoders (CoDAEs) and a categorical variant (CoDVAE) that leverage dual encoders and spatial conditioning to learn a compact latent space for robust anomaly scoring without requiring signal simulations during training. Evaluated on simulated CMS-like data for Hidden Valley scenarios (SUEP and SVJ), the approach achieves competitive AUROC and low FPR40, often surpassing traditional baselines and approaching supervised performance, while enabling fast inference suitable for high-level triggering. These results support deploying such fast, model-agnostic anomaly detectors in real-time trigger systems to enable generic discovery of unknown signals with reduced reliance on specific signal hypotheses.

Abstract

We present a family of conditional dual auto-encoders (CoDAEs) for generic and model-independent new physics searches at colliders. New physics signals, which arise from new types of particles and interactions, are considered in our study as anomalies causing deviations in data with respect to expected background events. In this work, we perform a normal-only anomaly detection, which employs only background samples, to search for manifestations of a dark version of strong force applying (variational) auto-encoders on raw detector images, which are large and highly sparse, without leveraging any physics-based pre-processing or strong assumption on the signals. The proposed CoDAE has a dual-encoder design, which is general and can learn an auxiliary yet compact latent space through spatial conditioning, showing a neat improvement over competitive physics-based baselines and related approaches, therefore also reducing the gap with fully supervised models. It is the first time an unsupervised model is shown to exhibit excellent discrimination against multiple dark shower models, illustrating the suitability of this method as an accurate, fast, model-independent algorithm to deploy, e.g., in the real-time event triggering systems of Large Hadron Collider experiments such as ATLAS and CMS.

Triggering Dark Showers with Conditional Dual Auto-Encoders

TL;DR

Abstract

Paper Structure (23 sections, 4 equations, 6 figures, 4 tables, 2 algorithms)

This paper contains 23 sections, 4 equations, 6 figures, 4 tables, 2 algorithms.

Introduction
The New Physics Search Scenario: Hidden valley models
The CMS Detector and simulated samples
Related Work
Reconstruction-based Anomaly Detection
Latent-based Anomaly Detection
Related High-Energy Physics Analyses
Simulated Dataset of particle collisions
Method
Image feature-engineering
Image Augmentations
The Dual Encoders
The Conditional Decoder
Categorical CoDVAE
Anomaly Scores
...and 8 more sections

Figures (6)

Figure 1: The CMS coordinate system, which explains the particle's motion within the cylindrical detector. Figure adapted from https://tikz.net/axis3d_cms/.
Figure 2: Energy, $I$, (left) and mask, $I_m$, (right) images of a single sample. Tracks information turned to be more discriminative than energy deposits, so we train on $I_m$ instead of $I$.
Figure 3: Example of five kinds of data augmentations, demonstrated on a random QCD sample considering energy deposits, whose value scale is depicted by the colored bar.
Figure 4: The architecture of both CoDAE and CoDVAE: convolutional and upsampling blocks in yellow and orange, respectively, gray dashed arrows and a $\oplus$ symbol depict a skip (or residual) connection DBLP:conf/cvpr/HeZRS16, lastly blue solid arrows and the $\circledcirc$ symbol denote spatial conditioning of $Z_m$ on $Z$. Each block uses $3\times 3$ convolutions followed by instance normalization DBLP:journals/corr/UlyanovVL16 and Leaky-ReLU maas2013rectifier. Down-sampling in the encoder is performed by strided convolutions instead of pooling.
Figure 5: The CoDAE's (left) and CoDVAE's (right) auxiliary latent space, $Z_m$, along with kernel density estimates of its components: the best components are, respectively, $Z_2$ and $Z_1$. As we can see in the density plots, the QCD (blue) and SUEPs (orange) look well separated, instead the SVJs (green) span in between the two classes.
...and 1 more figures

Triggering Dark Showers with Conditional Dual Auto-Encoders

TL;DR

Abstract

Triggering Dark Showers with Conditional Dual Auto-Encoders

Authors

TL;DR

Abstract

Table of Contents

Figures (6)