Forecasting Generative Amplification
Henning Bahl, Sascha Diefenbacher, Nina Elmer, Tilman Plehn, Jonas Spinner
TL;DR
The paper tackles how to quantify the statistical amplification of generative networks for LHC simulations without large holdout sets. It introduces two complementary approaches: averaging amplification, which compares integrals over phase-space volumes and separates statistical and model uncertainties using Bayesian networks or ensembles, and differential amplification, which leverages a Kolmogorov–Smirnov based test with a likelihood-ratio summary to detect local amplification without volume averaging. Through toy Gaussian-ring data and a realistic top-pair production scenario with Lorentz-equivariant generators, the study demonstrates that amplification can occur in specific regions of phase space but is not guaranteed universally across distributions. The results provide a practical framework for uncertainty quantification in generative fast-simulation pipelines and offer guidance on when and where amplification is reliable for HL-LHC-scale data.
Abstract
Generative networks are perfect tools to enhance the speed and precision of LHC simulations. It is important to understand their statistical precision, especially when generating events beyond the size of the training dataset. We present two complementary methods to estimate the amplification factor without large holdout datasets. Averaging amplification uses Bayesian networks or ensembling to estimate amplification from the precision of integrals over given phase-space volumes. Differential amplification uses hypothesis testing to quantify amplification without any resolution loss. Applied to state-of-the-art event generators, both methods indicate that amplification is possible in specific regions of phase space, but not yet across the entire distribution.
