Explicit or Implicit? Encoding Physics at the Precision Frontier

Victor Breso-Pla; Kevin Greif; Vinicius Mikuni; Benjamin Nachman; Tilman Plehn; Tanvi Wamorkar; Daniel Whiteson

Explicit or Implicit? Encoding Physics at the Precision Frontier

Victor Breso-Pla, Kevin Greif, Vinicius Mikuni, Benjamin Nachman, Tilman Plehn, Tanvi Wamorkar, Daniel Whiteson

TL;DR

This work compares the performance of the representative L-GATr and OmniLearn models on three especially challenging tasks: reweighting-based unfolding, likelihood-ratio estimation, and weakly supervised anomaly detection.

Abstract

High-performance machine learning tools in particle physics rest on two complementary directions: encoding symmetries explicitly in the architecture, and implicitly learning the structure of the data through large-scale (pre-) training. We compare the performance of the representative L-GATr and OmniLearn models on three especially challenging tasks: reweighting-based unfolding, likelihood-ratio estimation, and weakly supervised anomaly detection. Across all benchmarks, both methods achieve comparable performance given the statistical precision of the finetuning datasets, suggesting that the significant efficiency gains from encoding known particle physics structures are largely method-independent.

Explicit or Implicit? Encoding Physics at the Precision Frontier

TL;DR

Abstract

Paper Structure (11 sections, 7 equations, 3 figures, 7 tables)

This paper contains 11 sections, 7 equations, 3 figures, 7 tables.

Introduction
Explicit versus implicit physics knowledge
Lorentz-equivariant transformer
OmniLearn
Precision classification with similar classes
Reweighting-based unfolding for $pp$ collisions
Likelihood ratio estimation for $ep$ collisions
Weakly supervised anomaly detection
Outlook
Computational resource analysis
L-GATr details

Figures (3)

Figure 1: Reweighted distributions for six observables obtained from L-GATr, PET and OmniLearn after 1 classifier training. The L-GATr architecture consists of $10^6$ learnable parameters.
Figure 2: Unfolded distributions for six observables obtained from L-GATr, PET and OmniLearn after 5 reweighting iterations. The L-GATr architecture consists of $10^6$ learnable parameters.
Figure 3: Maximum SIC against the number of injected signal events for the randomly initialized PET and L-GATr models, as well as the pretrained OmniLearn. Solid lines plot the mean maximum SIC across an ensemble of 10 independent trainings, and shaded bands represent the 68% confidence band over the ensemble.

Explicit or Implicit? Encoding Physics at the Precision Frontier

TL;DR

Abstract

Explicit or Implicit? Encoding Physics at the Precision Frontier

Authors

TL;DR

Abstract

Table of Contents

Figures (3)