Table of Contents
Fetching ...

Learning the Standard Model Manifold: Bayesian Latent Diffusion for Collider Anomaly Detection

Jigar Patel, Tommaso Dorigo

TL;DR

This work proposes a physics-informed anomaly detection framework for collider data based on a Bayesian latent diffusion model that combines a probabilistic encoder with diffusion dynamics in the latent space, allowing for stable and flexible density estimation while explicitly enforcing physics constraints.

Abstract

We propose a physics-informed anomaly detection framework for collider data based on a Bayesian latent diffusion model. Our method combines a probabilistic encoder with diffusion dynamics in the latent space, allowing for stable and flexible density estimation while explicitly enforcing physics constraints, such as mass decorrelation and regularization of latent correlations. We train and test the model on simulated LHC jet data and evaluate its performance using seed-averaged ROC curves together with discovery-oriented metrics. Through a series of ablation studies, we show that the diffusion process, Bayesian regularization, and physics-motivated loss terms each contribute in a complementary way: they help stabilize training and improve generalization, even when the gains in peak performance are moderate. Overall, our results emphasize the importance of incorporating both uncertainty estimates and physics consistency when building reliable anomaly detection methods for new Physics searches in high-energy physics.

Learning the Standard Model Manifold: Bayesian Latent Diffusion for Collider Anomaly Detection

TL;DR

This work proposes a physics-informed anomaly detection framework for collider data based on a Bayesian latent diffusion model that combines a probabilistic encoder with diffusion dynamics in the latent space, allowing for stable and flexible density estimation while explicitly enforcing physics constraints.

Abstract

We propose a physics-informed anomaly detection framework for collider data based on a Bayesian latent diffusion model. Our method combines a probabilistic encoder with diffusion dynamics in the latent space, allowing for stable and flexible density estimation while explicitly enforcing physics constraints, such as mass decorrelation and regularization of latent correlations. We train and test the model on simulated LHC jet data and evaluate its performance using seed-averaged ROC curves together with discovery-oriented metrics. Through a series of ablation studies, we show that the diffusion process, Bayesian regularization, and physics-motivated loss terms each contribute in a complementary way: they help stabilize training and improve generalization, even when the gains in peak performance are moderate. Overall, our results emphasize the importance of incorporating both uncertainty estimates and physics consistency when building reliable anomaly detection methods for new Physics searches in high-energy physics.
Paper Structure (46 sections, 9 equations, 18 figures, 8 tables)

This paper contains 46 sections, 9 equations, 18 figures, 8 tables.

Figures (18)

  • Figure 1: Overview of the Bayesian latent diffusion framework. Input events are encoded into stochastic latent representations, refined by latent diffusion, and reconstructed by the decoder. The anomaly score is derived from the uncertainty-normalized reconstruction error, and training is governed by the combined physics-aware objective $\mathcal{L}_{\mathrm{total}}$.
  • Figure 2: Normalized distributions of the 14 input features for the QCD background (blue solid) and the $W^\prime$ signal (red dashed), prior to any selection on the anomaly score. Rows 1--2 correspond to the leading jet $j_1$; rows 3--4 to the subleading jet $j_2$; and the final row presents $\tau_3$ for both jets. All distributions are normalized to unit area.
  • Figure 3: Seed-averaged evolution of the total training loss and its individual components for the baseline model. Shaded bands indicate the standard deviation across six independent random seeds, demonstrating stable and well-balanced optimization.
  • Figure 4: Seed-averaged Pearson correlation coefficients between the anomaly score and the jet substructure observables $\tau_1$, $\tau_2$, and $\tau_3$ during training. The smooth convergence and narrow uncertainty bands indicate stable physics-aware learning.
  • Figure 5: Seed-averaged ROC curve for the baseline model. The solid line shows the mean ROC across six independent random seeds, while the shaded band represents the corresponding standard deviation. The dashed line indicates the performance of a random classifier.
  • ...and 13 more figures