Table of Contents
Fetching ...

Discriminative versus Generative Approaches to Simulation-based Inference

Benjamin Sluijter, Sascha Diefenbacher, Wahid Bhimji, Benjamin Nachman

TL;DR

The paper directly compares discriminative NSBI via likelihood-ratio classification and generative NSBI via direct density estimation for parameter inference in high-dimensional collider data. It formulates the problem as a mixture model $p(x|\mu,z)=\frac{\mu}{\mu+1}p_{\text{sig}}(x|z)+\frac{1}{\mu+1}p_{\text{back}}(x|z)$ with nuisance $z$ and targets the signal fraction $\mu$ using either direct density estimation with conditional normalizing flows or likelihood-ratio estimation with a parameterized classifier. Across Gaussian and Higgs datasets, both approaches recover $\mu$ with reasonable uncertainty; the likelihood-ratio method generally offers higher accuracy or precision within the explored hyperparameters, though both methods exhibit training-induced variability that motivates ensembling and calibration. The work demonstrates that NSBI can outperform histogram-based methods by exploiting unbinned, high-dimensional information, but practical deployment requires careful hyperparameter tuning, substantial computation, and ensemble strategies, especially for higher-dimensional problems. It also provides guidance for applying NSBI to collider physics and points to future directions in gradient-based inference and calibration techniques.

Abstract

Most of the fundamental, emergent, and phenomenological parameters of particle and nuclear physics are determined through parametric template fits. Simulations are used to populate histograms which are then matched to data. This approach is inherently lossy, since histograms are binned and low-dimensional. Deep learning has enabled unbinned and high-dimensional parameter estimation through neural likelihiood(-ratio) estimation. We compare two approaches for neural simulation-based inference (NSBI): one based on discriminative learning (classification) and one based on generative modeling. These two approaches are directly evaluated on the same datasets, with a similar level of hyperparameter optimization in both cases. In addition to a Gaussian dataset, we study NSBI using a Higgs boson dataset from the FAIR Universe Challenge. We find that both the direct likelihood and likelihood ratio estimation are able to effectively extract parameters with reasonable uncertainties. For the numerical examples and within the set of hyperparameters studied, we found that the likelihood ratio method is more accurate and/or precise. Both methods have a significant spread from the network training and would require ensembling or other mitigation strategies in practice.

Discriminative versus Generative Approaches to Simulation-based Inference

TL;DR

The paper directly compares discriminative NSBI via likelihood-ratio classification and generative NSBI via direct density estimation for parameter inference in high-dimensional collider data. It formulates the problem as a mixture model with nuisance and targets the signal fraction using either direct density estimation with conditional normalizing flows or likelihood-ratio estimation with a parameterized classifier. Across Gaussian and Higgs datasets, both approaches recover with reasonable uncertainty; the likelihood-ratio method generally offers higher accuracy or precision within the explored hyperparameters, though both methods exhibit training-induced variability that motivates ensembling and calibration. The work demonstrates that NSBI can outperform histogram-based methods by exploiting unbinned, high-dimensional information, but practical deployment requires careful hyperparameter tuning, substantial computation, and ensemble strategies, especially for higher-dimensional problems. It also provides guidance for applying NSBI to collider physics and points to future directions in gradient-based inference and calibration techniques.

Abstract

Most of the fundamental, emergent, and phenomenological parameters of particle and nuclear physics are determined through parametric template fits. Simulations are used to populate histograms which are then matched to data. This approach is inherently lossy, since histograms are binned and low-dimensional. Deep learning has enabled unbinned and high-dimensional parameter estimation through neural likelihiood(-ratio) estimation. We compare two approaches for neural simulation-based inference (NSBI): one based on discriminative learning (classification) and one based on generative modeling. These two approaches are directly evaluated on the same datasets, with a similar level of hyperparameter optimization in both cases. In addition to a Gaussian dataset, we study NSBI using a Higgs boson dataset from the FAIR Universe Challenge. We find that both the direct likelihood and likelihood ratio estimation are able to effectively extract parameters with reasonable uncertainties. For the numerical examples and within the set of hyperparameters studied, we found that the likelihood ratio method is more accurate and/or precise. Both methods have a significant spread from the network training and would require ensembling or other mitigation strategies in practice.

Paper Structure

This paper contains 18 sections, 7 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Vizualizsation of the Gaussain data for rotation values $z=0.5$ (left-hand side) and $z=0.25 \pi$ (right-hand side), as well as for the large distance case (upper panels) and the small distance case (lower panels).
  • Figure 2: Visualization of the features used for the Higgs data. The left-hand plot shows the invariant mass of the hadronic $\tau$ and non-$\tau$ lepton, the right-hand plot shows the $p_{T}$ ratio of the hadronic $\tau$ and non-$\tau$ lepton. The solid lines show the features for the nominal TES value, while the dashed and dash-dotted lines show them for smaller and larger TES values, respectively.
  • Figure 3: Coverages vales as a function of the ratio $k$ between the size of an evaluation set and the size of the bootstrapped subset ($N$). Inference is performed on either $\mu$ or $z$ as indicated in the legend. Results are shown for both the small-distance ($r=0.5$) and large-distance ($r=2$) Gaussian datasets.
  • Figure 4: Gaussian long distance case results for the DLE (learning rate = $3\times 10^{-6}$) and LRE model (learning rate = $3\times 10^{-5}$). Histograms show the performance of 30 independently trained models of both the LRE model and the DLE model. For the coverage and mean width results, the plots are split into an upper section for the model results and a lower section for the ground truth model, indicating approximately what the histograms should look like if all 30 models were the perfect model.
  • Figure 5: Gaussian small distance case results for the DLE (learning rate = $10^{-5}$) and LRE model (learning rate = $3\times 10^{-5}$). Histograms show the performance of 30 independently trained models of both the LRE model and the DLE model. For the coverage and mean width results, the plots are split into an upper section for the model results and a lower section for the ground truth model, indicating approximately what the histograms should look like if all 30 models were the perfect model.
  • ...and 3 more figures