Table of Contents
Fetching ...

An unfolding method based on conditional Invertible Neural Networks (cINN) using iterative training

Mathias Backes, Anja Butter, Monica Dunford, Bogdan Malaescu

TL;DR

The paper tackles detector unfolding in high dimensions by addressing biases from imperfect simulations. It introduces iterative conditional invertible neural networks (IcINN) that learn the posterior $p(x|y)$ via a cINN and refine the simulation through iterative reweighting to better match data. Demonstrations on a 1D Gaussian toy and on a $pp \rightarrow Z\gamma\gamma$ EFT-driven pseudo-data (including a 2D unfolding over $p_T^-$ and $p_T^+$) show reduced data–MC bias while preserving event-by-event probabilistic unfolding. The work analyzes statistical uncertainties and correlations, and provides a public codebase to enable application to real data.

Abstract

The unfolding of detector effects is crucial for the comparison of data to theory predictions. While traditional methods are limited to representing the data in a low number of dimensions, machine learning has enabled new unfolding techniques while retaining the full dimensionality. Generative networks like invertible neural networks~(INN) enable a probabilistic unfolding, which map individual events to their corresponding unfolded probability distribution. The accuracy of such methods is however limited by how well simulated training samples model the actual data that is unfolded. We introduce the iterative conditional INN~(IcINN) for unfolding that adjusts for deviations between simulated training samples and data. The IcINN unfolding is first validated on toy data and then applied to pseudo-data for the $pp \to Z γγ$ process.

An unfolding method based on conditional Invertible Neural Networks (cINN) using iterative training

TL;DR

The paper tackles detector unfolding in high dimensions by addressing biases from imperfect simulations. It introduces iterative conditional invertible neural networks (IcINN) that learn the posterior via a cINN and refine the simulation through iterative reweighting to better match data. Demonstrations on a 1D Gaussian toy and on a EFT-driven pseudo-data (including a 2D unfolding over and ) show reduced data–MC bias while preserving event-by-event probabilistic unfolding. The work analyzes statistical uncertainties and correlations, and provides a public codebase to enable application to real data.

Abstract

The unfolding of detector effects is crucial for the comparison of data to theory predictions. While traditional methods are limited to representing the data in a low number of dimensions, machine learning has enabled new unfolding techniques while retaining the full dimensionality. Generative networks like invertible neural networks~(INN) enable a probabilistic unfolding, which map individual events to their corresponding unfolded probability distribution. The accuracy of such methods is however limited by how well simulated training samples model the actual data that is unfolded. We introduce the iterative conditional INN~(IcINN) for unfolding that adjusts for deviations between simulated training samples and data. The IcINN unfolding is first validated on toy data and then applied to pseudo-data for the process.
Paper Structure (11 sections, 18 equations, 13 figures, 2 tables)

This paper contains 11 sections, 18 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Structure of the conditional INN. Random numbers $\lbrace z \rbrace$ are mapped to particle-level events $\lbrace x \rbrace$ under the condition of a detector-level event $\lbrace y \rbrace$. The loss $L$ follows Eq. \ref{['cINN_loss']}, a tilde indicates a cINN-generated event.
  • Figure 2: Illustration of the iterative cINN unfolding algorithm. In a first step the regular training of the cINN on the current Monte Carlo Data is performed. As a second step the cINN unfolds the experimentally measured distribution. In a third step the Monte Carlo simulation is reweighted to match the unfolded distribution on Particle Level. This procedure is iterated, always with a modified Monte Carlo Simulation.
  • Figure 3: Gaussian toy example used to demonstrate the IcINN algorithm. The left image shows all relevant distributions of the model: the data truth (red, solid) the data reco (red, dashed), the MC truth (blue, solid) and the MC reco (blue, dashed), each with $10^6$ sampled events. On the right the cINN unfolding is applied to the model; the resulting unfolded distribution (purple, solid) is biased towards the MC Truth.
  • Figure 4: Iterative unfolding results for the one-dimensional toy model. On the left in the upper part we show the MC and data truth as well as the unfolding result in each iteration (solid lines) together with its analytic prediction (dashed line). In the lower part we show the ratio of the cINN with the data truth; it is clearly visible that the bias towards the MC is iteratively reduced. On the right we show the unfolded distribution for a single event at $y_m=5$. Again the result after each iteration (solid histogram lines) is very close to its analytic prediction (dashed lines).
  • Figure 5: Relative statistical uncertainties of the IcINN before any reweighting, i.e. for iteration $i=0$ (left), and with two reweightings, i.e. for iteration $i=2$ (right), evaluated with no fluctuations for the input distributions (green histogram), with fluctuations from data (blue dashed line), MC (black dotted line) and total (red histogram). When deriving these uncertainties, each event is unfolded 30 times and a number of $N_\text{toys}=400$ bootstrap replicas is used.
  • ...and 8 more figures