Table of Contents
Fetching ...

Generalizable Foundation Models for Calorimetry via Mixtures-of-Experts and Parameter Efficient Fine Tuning

Carlos Cardona-Giraldo, Cristiano Fanelli, James Giroux, Cole Granger, Benjamin Nachman, Gerald Sabin

Abstract

Modern particle physics experiments face an increasing demand for high-fidelity detector simulation as luminosities rise and computational requirements approach the limits of available resources. Deep generative models have emerged as promising surrogates for traditional Monte Carlo simulation, with recent advances drawing inspiration from large language models (LLM) and next-token prediction paradigms. In this work, we introduce a generalizable foundation model for calorimetry built on next-token transformer backbones, designed to support modular adaptation across materials, particle species, and detector configurations. Our approach combines Mixture-of-Experts pre-training with parameter-efficient fine-tuning strategies to enable controlled, additive model expansion without catastrophic forgetting. A pre-trained backbone is trained to generate electromagnetic showers across multiple absorber materials, while new materials are incorporated through the addition and tuning of lightweight expert modules. Extensions to new particle types are achieved via parameter-efficient fine-tuning and modular vocabularies, preserving the integrity of the base model. This design enables efficient, incremental knowledge integration as new simulation datasets become available, a critical requirement in realistic detector-development workflows. In addition, we demonstrate that next-token calorimeter models are computationally competitive with standard generative approaches under established LLM optimization procedures. These results establish next-token architectures as a viable path toward extensible, physics-aware foundation models for calorimetry and future high-energy physics experiments.

Generalizable Foundation Models for Calorimetry via Mixtures-of-Experts and Parameter Efficient Fine Tuning

Abstract

Modern particle physics experiments face an increasing demand for high-fidelity detector simulation as luminosities rise and computational requirements approach the limits of available resources. Deep generative models have emerged as promising surrogates for traditional Monte Carlo simulation, with recent advances drawing inspiration from large language models (LLM) and next-token prediction paradigms. In this work, we introduce a generalizable foundation model for calorimetry built on next-token transformer backbones, designed to support modular adaptation across materials, particle species, and detector configurations. Our approach combines Mixture-of-Experts pre-training with parameter-efficient fine-tuning strategies to enable controlled, additive model expansion without catastrophic forgetting. A pre-trained backbone is trained to generate electromagnetic showers across multiple absorber materials, while new materials are incorporated through the addition and tuning of lightweight expert modules. Extensions to new particle types are achieved via parameter-efficient fine-tuning and modular vocabularies, preserving the integrity of the base model. This design enables efficient, incremental knowledge integration as new simulation datasets become available, a critical requirement in realistic detector-development workflows. In addition, we demonstrate that next-token calorimeter models are computationally competitive with standard generative approaches under established LLM optimization procedures. These results establish next-token architectures as a viable path toward extensible, physics-aware foundation models for calorimetry and future high-energy physics experiments.

Paper Structure

This paper contains 7 sections, 4 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: Overview of the modular architecture for material and particle-species adaptation: The framework utilizes a core transformer backbone consisting of cross-attention and self-attention decoder blocks that remain frozen during secondary adaptation phases. Material extensibility is achieved through a Mixture-of-Experts (MoE) layer where a router directs inputs to specialized modules, allowing for the addition of new materials by fine-tuning only a singular new expert. When transitioning to different particle species, the model employs a parameter-efficient strategy using LoRA modules and expanded particle-specific vocabulary heads for pixel and energy prediction while the base photon model parameters remain static. For subsequent material expansion of an adapted model, a new expert is integrated while the previously tuned LoRA and vocabulary components are frozen to preserve the learned particle-specific features. The system is conditioned throughout these stages by a combination of spatial, kinematic, and energy query embeddings alongside a unique particle identifier.
  • Figure 2: Validation of generative shower modeling for photons in tungsten: Distribution level comparison between ground truth Geant4 reference sample (gray shaded), against the proposed method (blue) and the Omnijet-$\alpha_c$ baseline (green) for photons in tungsten. The top row depicts observables including visible cell energy (a.u.), total energy sum, and total number of hits. The bottom row depicts spatial shower profiles including the longitudinal center of gravity ($Z$), energy deposition per layer, and radial energy distribution. The lower panels show the ratio of each model to the Geant4 reference, where a ratio of $1.0$ (dashed line) indicates perfect agreement. A $3\sigma$ uncertainty profile is provided in the ratio plots, accounting for statistical error in both the generated, and ground truth sample.
  • Figure 3: Validation of generative shower modeling for photons in tantalum: Distribution level comparison between ground truth Geant4 reference sample (gray shaded) and the proposed method (blue) for photons in tantalum.
  • Figure 4: Validation of energy kinematic conditioning through prepended context at 50 GeV: Distribution level comparison between ground truth Geant4 reference sample (gray shaded) and the proposed method (blue) at a fixed initial energy of $[per-mode=symbol]{50}{\giga\eV}$ for photons in tungsten.
  • Figure 5: Fine-tuning efficiency for photon showers in lead across varying sample sizes: Comparison between the ground truth Geant4 reference (black) and the fine-tuned model at $1\text{k}$ (blue), $10\text{k}$ (green), and full (red dashed) sample sizes. Lower panels display the ratio to the Geant4 reference, where a ratio of $1.0$ (dashed line) indicates perfect agreement. Shaded areas in the ratio plots depict the statistical uncertainty from data samples in quadrature with the uncertainty from bootstrapped fine-tunings, represented at the $3\sigma$ level. Results demonstrate that freezing the base tantalum/tungsten MoE backbone and updating only the lead-specific expert allows for high-fidelity adaptation even with limited training data.
  • ...and 6 more figures