Anomalies, Representations, and Self-Supervision

Barry M. Dillon; Luigi Favaro; Friedrich Feiden; Tanmoy Modak; Tilman Plehn

Anomalies, Representations, and Self-Supervision

Barry M. Dillon, Luigi Favaro, Friedrich Feiden, Tanmoy Modak, Tilman Plehn

TL;DR

A self-supervised method for density-based anomaly detection using contrastive learning, and significant improvements on performance metrics for all signals when compared to the raw data baseline are found.

Abstract

We develop a self-supervised method for density-based anomaly detection using contrastive learning, and test it using event-level anomaly data from CMS ADC2021. The AnomalyCLR technique is data-driven and uses augmentations of the background data to mimic non-Standard-Model events in a model-agnostic way. It uses a permutation-invariant Transformer Encoder architecture to map the objects measured in a collider event to the representation space, where the data augmentations define a representation space which is sensitive to potential anomalous features. An AutoEncoder trained on background representations then computes anomaly scores for a variety of signals in the representation space. With AnomalyCLR we find significant improvements on performance metrics for all signals when compared to the raw data baseline.

Anomalies, Representations, and Self-Supervision

TL;DR

Abstract

Paper Structure (12 sections, 6 equations, 3 figures, 2 tables)

This paper contains 12 sections, 6 equations, 3 figures, 2 tables.

Introduction
Dataset
AnomalyCLR
Contrastive learning
CLR for anomaly detection
Application to event-level anomalies
Anomaly scores
Results
Comparison of methods
The effect of anomaly-augmentations
The effect of representation dimension
Summary & conclusions

Figures (3)

Figure 1: Comparison between the AE on raw data and the AE on the CLR representations trained with the $\mathcal{L}^+_{\text{AnomCLR}}$ loss function.
Figure 2: Results of a scan on the anomaly-augmentations used with the $\mathcal{L}^+_{\text{AnomCLR}}$ loss function. The augmentations are defined in \ref{['sec:applicationeventlevel']}. The dashed lines here correspond to the AutoEncoder on raw data baseline performance.
Figure 3: Results of a scan on the representation dimension used with the $\mathcal{L}^+_{\text{AnomCLR}}$ loss function. The dashed lines here correspond to the AutoEncoder on raw data baseline performance.

Anomalies, Representations, and Self-Supervision

TL;DR

Abstract

Anomalies, Representations, and Self-Supervision

Authors

TL;DR

Abstract

Table of Contents

Figures (3)