Table of Contents
Fetching ...

Anomalies, Representations, and Self-Supervision

Barry M. Dillon, Luigi Favaro, Friedrich Feiden, Tanmoy Modak, Tilman Plehn

TL;DR

A self-supervised method for density-based anomaly detection using contrastive learning, and significant improvements on performance metrics for all signals when compared to the raw data baseline are found.

Abstract

We develop a self-supervised method for density-based anomaly detection using contrastive learning, and test it using event-level anomaly data from CMS ADC2021. The AnomalyCLR technique is data-driven and uses augmentations of the background data to mimic non-Standard-Model events in a model-agnostic way. It uses a permutation-invariant Transformer Encoder architecture to map the objects measured in a collider event to the representation space, where the data augmentations define a representation space which is sensitive to potential anomalous features. An AutoEncoder trained on background representations then computes anomaly scores for a variety of signals in the representation space. With AnomalyCLR we find significant improvements on performance metrics for all signals when compared to the raw data baseline.

Anomalies, Representations, and Self-Supervision

TL;DR

A self-supervised method for density-based anomaly detection using contrastive learning, and significant improvements on performance metrics for all signals when compared to the raw data baseline are found.

Abstract

We develop a self-supervised method for density-based anomaly detection using contrastive learning, and test it using event-level anomaly data from CMS ADC2021. The AnomalyCLR technique is data-driven and uses augmentations of the background data to mimic non-Standard-Model events in a model-agnostic way. It uses a permutation-invariant Transformer Encoder architecture to map the objects measured in a collider event to the representation space, where the data augmentations define a representation space which is sensitive to potential anomalous features. An AutoEncoder trained on background representations then computes anomaly scores for a variety of signals in the representation space. With AnomalyCLR we find significant improvements on performance metrics for all signals when compared to the raw data baseline.
Paper Structure (12 sections, 6 equations, 3 figures, 2 tables)

This paper contains 12 sections, 6 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Comparison between the AE on raw data and the AE on the CLR representations trained with the $\mathcal{L}^+_{\text{AnomCLR}}$ loss function.
  • Figure 2: Results of a scan on the anomaly-augmentations used with the $\mathcal{L}^+_{\text{AnomCLR}}$ loss function. The augmentations are defined in \ref{['sec:applicationeventlevel']}. The dashed lines here correspond to the AutoEncoder on raw data baseline performance.
  • Figure 3: Results of a scan on the representation dimension used with the $\mathcal{L}^+_{\text{AnomCLR}}$ loss function. The dashed lines here correspond to the AutoEncoder on raw data baseline performance.