Anomaly preserving contrastive neural embeddings for end-to-end model-independent searches at the LHC
Kyle Metzger, Lana Xu, Mia Sodini, Thea K. Arrestad, Katya Govorkova, Gaia Grosso, Philip Harris
TL;DR
The paper addresses anomaly detection at the LHC by learning compact, anomaly-preserving event representations through contrastive neural embeddings. It compares supervised and self-supervised contrastive objectives for both MLP and Transformer encoders, evaluating their effectiveness as inputs to signal-agnostic statistical tests. It finds that supervised contrastive learning delivers the strongest gains across diverse backgrounds and unseen signals, with Transformer architectures offering advantages for complex patterns; applied to a Delphes ADC2021 dataset and a challenging black-box test, the approach demonstrates substantial improvements in discovery power and feasibility for end-to-end, model-independent searches at the LHC.
Abstract
Anomaly detection - identifying deviations from Standard Model predictions - is a key challenge at the Large Hadron Collider due to the size and complexity of its datasets. This is typically addressed by transforming high-dimensional detector data into lower-dimensional, physically meaningful features. We tackle feature extraction for anomaly detection by learning powerful low-dimensional representations via contrastive neural embeddings. This approach preserves potential anomalies indicative of new physics and enables rare signal extraction using novel machine learning-based statistical methods for signal-independent hypothesis testing. We compare supervised and self-supervised contrastive learning methods, for both MLP- and Transformer-based neural embeddings, trained on the kinematic observables of physics objects in LHC collision events. The learned embeddings serve as input representations for signal-agnostic statistical detection methods in inclusive final states. We achieve significant improvement in discovery power for both rare new physics signals and rare Standard Model processes across diverse final states, demonstrating its applicability for efficiently searching for diverse signals simultaneously. We study the impact of architectural choices, contrastive loss formulations, supervision levels, and embedding dimensionality on anomaly detection performance. We show that the optimal representation for background classification does not always maximize sensitivity to new physics signals, revealing an inherent trade-off between background structure preservation and anomaly enhancement. We demonstrate that combining compression with domain knowledge for label encoding produces the most effective data representation for statistical discovery of anomalies.
