Table of Contents
Fetching ...

Finetuning Foundation Models for Joint Analysis Optimization

Matthias Vigl, Nicole Hartman, Lukas Heinrich

TL;DR

The paper investigates applying foundation-model workflows to high-energy physics by treating reconstruction as a learnable backbone and analysis as a downstream head that can be finetuned end-to-end. Through a demonstrator based on a heavy resonance decaying to two Higgs bosons, it shows that finetuning pretrained backbones yields notable gains in background rejection and data efficiency compared with frozen or from-scratch baselines, and that domain adaptation from larger jet datasets further improves performance. The study explores three architectures and three training strategies, highlighting that end-to-end optimization can achieve substantial improvements while reducing required labeled data. These findings suggest a practical path toward integrating foundation-model style pretraining, finetuning, and cross-dataset transfer in HEP analyses, with implications for calibration and task design in future work.

Abstract

In this work we demonstrate that significant gains in performance and data efficiency can be achieved in High Energy Physics (HEP) by moving beyond the standard paradigm of sequential optimization or reconstruction and analysis components. We conceptually connect HEP reconstruction and analysis to modern machine learning workflows such as pretraining, finetuning, domain adaptation and high-dimensional embedding spaces and quantify the gains in the example usecase of searches of heavy resonances decaying via an intermediate di-Higgs system to four $b$-jets.

Finetuning Foundation Models for Joint Analysis Optimization

TL;DR

The paper investigates applying foundation-model workflows to high-energy physics by treating reconstruction as a learnable backbone and analysis as a downstream head that can be finetuned end-to-end. Through a demonstrator based on a heavy resonance decaying to two Higgs bosons, it shows that finetuning pretrained backbones yields notable gains in background rejection and data efficiency compared with frozen or from-scratch baselines, and that domain adaptation from larger jet datasets further improves performance. The study explores three architectures and three training strategies, highlighting that end-to-end optimization can achieve substantial improvements while reducing required labeled data. These findings suggest a practical path toward integrating foundation-model style pretraining, finetuning, and cross-dataset transfer in HEP analyses, with implications for calibration and task design in future work.

Abstract

In this work we demonstrate that significant gains in performance and data efficiency can be achieved in High Energy Physics (HEP) by moving beyond the standard paradigm of sequential optimization or reconstruction and analysis components. We conceptually connect HEP reconstruction and analysis to modern machine learning workflows such as pretraining, finetuning, domain adaptation and high-dimensional embedding spaces and quantify the gains in the example usecase of searches of heavy resonances decaying via an intermediate di-Higgs system to four -jets.
Paper Structure (15 sections, 3 equations, 12 figures, 5 tables)

This paper contains 15 sections, 3 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Strategies from modern machine learning such as finetuning, large-scale pretraining, finetuning, domain adaptation and high-dimensional embeddings (green curves) can lead to significant performance gains over the traditional HEP approach, denoted here as S+HLF(frozen). Top: Performance evolution as a function of training dataset size. Bottom: Final Performance at 10M training samples.
  • Figure 2: Modern machine learning and HEP data analysis exhibit conceptual similarities. Reconstruction plays the role of a backbone or foundation model yielding a general purpose representation of high-dimensional low-level data. The physics data analysis itself is a "head" that produces task-specific summary statistics.
  • Figure 3: Hierarchical neural network structures considered in this work with decreasing levels of structural constraints and manually engineered features.
  • Figure 4: Performance as a function of labeled examples across three training strategies shown for the investigated architectures. For all architectures we see a significant benefit from finetuning over a frozen backbone. Pretraining is significantly more performant than training from scratch. For very large datasets from-scratch training can exceed a frozen backbone.
  • Figure 5: Top: Performance metrics of S+HLF for pretext (left) and downstream (right) tasks. In finetuned training the learnable scalar in S+HLF trades off $Xbb$ performance against downstream task performance. In from-scratch training $Xbb$-tagging emerges as a useful subtask without supervision. Bottom: $Xbb$ Performance of learned scalar feature as function of training samples
  • ...and 7 more figures