Table of Contents
Fetching ...

Searching for Anomalies with Foundation Models

Vinicius Mikuni, Benjamin Nachman

Abstract

Foundation models have the potential to extend the discovery reach for anomaly detection searches. When studying the large OmniLearned foundation model on data from the CMS experiment, unexpected behavior was observed in a mass sideband. The purpose of this paper is to perform a full analysis, including a complete background estimate, on the phase space picked out by the large model. We find that the background estimation describes the data well in validation regions, but is unable to accurately model the signal region. We invite further scrutiny of these events and our methods.

Searching for Anomalies with Foundation Models

Abstract

Foundation models have the potential to extend the discovery reach for anomaly detection searches. When studying the large OmniLearned foundation model on data from the CMS experiment, unexpected behavior was observed in a mass sideband. The purpose of this paper is to perform a full analysis, including a complete background estimate, on the phase space picked out by the large model. We find that the background estimation describes the data well in validation regions, but is unable to accurately model the signal region. We invite further scrutiny of these events and our methods.
Paper Structure (12 sections, 1 equation, 21 figures)

This paper contains 12 sections, 1 equation, 21 figures.

Figures (21)

  • Figure 1: A histogram of the groomed jet mass after selecting the most anomalous jets based on the small or large OmniLearned models. A parametric fit to the sidebands is also shown. The fit is excellent for the small model and poor for the large model.
  • Figure 2: Visual representation of the eight different regions used simultaneously to fit and constrain the contribution of different backgrounds.
  • Figure 3: Compatibility test between the ABCD prediction using QCD simulated events and the soft drop mass distribution of QCD events in the region where both jets pass the $0.2\%$ data efficiency selection. A constant fit is performed to determine the compatibility between the prediction and the actual distribution, shown in blue.
  • Figure 4: Distribution of the leading jet soft drop mass where both jets are considered anomalous based on the OmniLearned small model score. The region where both jets have low $\tau_{21}$ values is shown on the right while the region where at least one jet fails the $\tau_{21}$ selection is shown on the left. Shaded regions represent the total background uncertainty.
  • Figure 5: Compatibility test between the ABCD prediction using QCD simulated events and the soft drop mass distribution of QCD events in the region where both jets pass the $0.2\%$ data efficiency selection based on the large OmniLearned model. A constant fit is performed to determine the compatibility between the prediction and the actual distribution, shown in blue.
  • ...and 16 more figures