Data-Driven High-Dimensional Statistical Inference with Generative Models

Oz Amram; Manuel Szewc

Data-Driven High-Dimensional Statistical Inference with Generative Models

Oz Amram, Manuel Szewc

TL;DR

HI-SIGMA introduces a data-driven, high-dimensional inference framework for resonant analyses at the LHC by learning multi-dimensional signal and background densities with generative models and performing unbinned likelihood fits. It factors densities around a resonance as $P_k(oldsymbol{x})=P_k(oldsymbol{x}'|m)P_k(m)$, uses an extended profile likelihood, and interpolates backgrounds from sidebands into the signal region, enabling robust uncertainty quantification. The method demonstrates improved sensitivity over traditional cut-based and low-bin classifier approaches in a di-Higgs $bb\gamma\gamma$ proxy, while maintaining interpretability and a principled treatment of systematic uncertainties via bootstrapping and shape variations. This work highlights the practical potential of data-driven, high-dimensional density estimation for complex final states and multi-parameter inference, with implications for future Higgs measurements and EFT operator constraints.

Abstract

Crucial to many measurements at the LHC is the use of correlated multi-dimensional information to distinguish rare processes from large backgrounds, which is complicated by the poor modeling of many of the crucial backgrounds in Monte Carlo simulations. In this work, we introduce HI-SIGMA, a method to perform unbinned high-dimensional statistical inference with data-driven background distributions. In contradistinction to many applications of Simulation Based Inference in High Energy Physics, HI-SIGMA relies on generative ML models, rather than classifiers, to learn the signal and background distributions in the high-dimensional space. These ML models allow for interpretable inference while also incorporating model errors and other sources of systematic uncertainties. We showcase this methodology on a simplified version of a di-Higgs measurement in the $bbγγ$ final state, where the di-photon resonance allows for background interpolation from sidebands into the signal region. We demonstrate that HI-SIGMA provides improved sensitivity as compared to standard classifier-based methods, and that systematic uncertainties can be straightforwardly incorporated by extending methods which have been used for histogram based analyses.

Data-Driven High-Dimensional Statistical Inference with Generative Models

TL;DR

, uses an extended profile likelihood, and interpolates backgrounds from sidebands into the signal region, enabling robust uncertainty quantification. The method demonstrates improved sensitivity over traditional cut-based and low-bin classifier approaches in a di-Higgs

proxy, while maintaining interpretability and a principled treatment of systematic uncertainties via bootstrapping and shape variations. This work highlights the practical potential of data-driven, high-dimensional density estimation for complex final states and multi-parameter inference, with implications for future Higgs measurements and EFT operator constraints.

Abstract

final state, where the di-photon resonance allows for background interpolation from sidebands into the signal region. We demonstrate that HI-SIGMA provides improved sensitivity as compared to standard classifier-based methods, and that systematic uncertainties can be straightforwardly incorporated by extending methods which have been used for histogram based analyses.

Data-Driven High-Dimensional Statistical Inference with Generative Models

TL;DR

Abstract

Data-Driven High-Dimensional Statistical Inference with Generative Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)