Table of Contents
Fetching ...

Forecasting Generative Amplification

Henning Bahl, Sascha Diefenbacher, Nina Elmer, Tilman Plehn, Jonas Spinner

TL;DR

The paper tackles how to quantify the statistical amplification of generative networks for LHC simulations without large holdout sets. It introduces two complementary approaches: averaging amplification, which compares integrals over phase-space volumes and separates statistical and model uncertainties using Bayesian networks or ensembles, and differential amplification, which leverages a Kolmogorov–Smirnov based test with a likelihood-ratio summary to detect local amplification without volume averaging. Through toy Gaussian-ring data and a realistic top-pair production scenario with Lorentz-equivariant generators, the study demonstrates that amplification can occur in specific regions of phase space but is not guaranteed universally across distributions. The results provide a practical framework for uncertainty quantification in generative fast-simulation pipelines and offer guidance on when and where amplification is reliable for HL-LHC-scale data.

Abstract

Generative networks are perfect tools to enhance the speed and precision of LHC simulations. It is important to understand their statistical precision, especially when generating events beyond the size of the training dataset. We present two complementary methods to estimate the amplification factor without large holdout datasets. Averaging amplification uses Bayesian networks or ensembling to estimate amplification from the precision of integrals over given phase-space volumes. Differential amplification uses hypothesis testing to quantify amplification without any resolution loss. Applied to state-of-the-art event generators, both methods indicate that amplification is possible in specific regions of phase space, but not yet across the entire distribution.

Forecasting Generative Amplification

TL;DR

The paper tackles how to quantify the statistical amplification of generative networks for LHC simulations without large holdout sets. It introduces two complementary approaches: averaging amplification, which compares integrals over phase-space volumes and separates statistical and model uncertainties using Bayesian networks or ensembles, and differential amplification, which leverages a Kolmogorov–Smirnov based test with a likelihood-ratio summary to detect local amplification without volume averaging. Through toy Gaussian-ring data and a realistic top-pair production scenario with Lorentz-equivariant generators, the study demonstrates that amplification can occur in specific regions of phase space but is not guaranteed universally across distributions. The results provide a practical framework for uncertainty quantification in generative fast-simulation pipelines and offer guidance on when and where amplification is reliable for HL-LHC-scale data.

Abstract

Generative networks are perfect tools to enhance the speed and precision of LHC simulations. It is important to understand their statistical precision, especially when generating events beyond the size of the training dataset. We present two complementary methods to estimate the amplification factor without large holdout datasets. Averaging amplification uses Bayesian networks or ensembling to estimate amplification from the precision of integrals over given phase-space volumes. Differential amplification uses hypothesis testing to quantify amplification without any resolution loss. Applied to state-of-the-art event generators, both methods indicate that amplification is possible in specific regions of phase space, but not yet across the entire distribution.

Paper Structure

This paper contains 12 sections, 43 equations, 15 figures.

Figures (15)

  • Figure 1: Illustration of the averaging amplification estimate. The statistical uncertainty of the generated dataset is shown in green; the model uncertainty of the generative network is shown in orange.
  • Figure 2: Scaling of the mean quadratic deviation from the true integral value as a function of the number of samples drawn from the fitted Gaussian. The curve points are averaged over 50 independent experiments. replace $\sigma_\text{true}$ in all plots.
  • Figure 3: Training data and integral regions for Gaussian ring.
  • Figure 4: Actual standard deviation and estimated uncertainties for various radial integrals over a Gaussian ring. Left: 2D Gaussian ring. Right: 4D Gaussian ring.
  • Figure 5: Actual standard deviation and estimated uncertainties for radial integrals, including only regions with large radius $R$. Left: 2D Gaussian ring. Right: 4D Gaussian ring.
  • ...and 10 more figures