The Landscape of Unfolding with Machine Learning

Nathan Huetsch; Javier Mariño Villadamigo; Alexander Shmakov; Sascha Diefenbacher; Vinicius Mikuni; Theo Heimel; Michael Fenton; Kevin Greif; Benjamin Nachman; Daniel Whiteson; Anja Butter; Tilman Plehn

The Landscape of Unfolding with Machine Learning

Nathan Huetsch, Javier Mariño Villadamigo, Alexander Shmakov, Sascha Diefenbacher, Vinicius Mikuni, Theo Heimel, Michael Fenton, Kevin Greif, Benjamin Nachman, Daniel Whiteson, Anja Butter, Tilman Plehn

TL;DR

The paper tackles the challenge of unfolding detector effects and translating observations to parton-level information in high-energy physics by surveying three ML-based families: reweighting (OmniFold and Bayesian variants), distribution mapping (Schrödinger Bridge and Direct Diffusion), and conditional generative unfolding (cINN, Transfermer, CFM, TraCFM, Latent Diffusion). By benchmarking these methods on identical datasets, the authors demonstrate that each approach can reproduce particle- and parton-level distributions with percent-level accuracy across complex observables, while offering complementary strengths and uncertainty quantification. The study shows practical viability for unbinned, high-dimensional cross-section measurements, enabling broader community access and potential sensitivity to new phenomena, with concrete extensions to $Z$+jets detector unfolding and top-quark pair production. The results suggest a versatile ML toolkit for future SM tests and global analyses, combining model-agnostic reweighting, distribution-mapping, and physics-informed generative modeling. These advances have significant practical impact by reducing reliance on expensive forward simulations and enabling precise, multi-dimensional unfolding in contemporary collider data analysis.

Abstract

Recent innovations from machine learning allow for data unfolding, without binning and including correlations across many dimensions. We describe a set of known, upgraded, and new methods for ML-based unfolding. The performance of these approaches are evaluated on the same two datasets. We find that all techniques are capable of accurately reproducing the particle-level spectra across complex observables. Given that these approaches are conceptually diverse, they offer an exciting toolkit for a new class of measurements that can probe the Standard Model with an unprecedented level of detail and may enable sensitivity to new phenomena.

The Landscape of Unfolding with Machine Learning

TL;DR

+jets detector unfolding and top-quark pair production. The results suggest a versatile ML toolkit for future SM tests and global analyses, combining model-agnostic reweighting, distribution-mapping, and physics-informed generative modeling. These advances have significant practical impact by reducing reliance on expensive forward simulations and enabling precise, multi-dimensional unfolding in contemporary collider data analysis.

Abstract

Paper Structure (31 sections, 53 equations, 12 figures, 5 tables)

This paper contains 31 sections, 53 equations, 12 figures, 5 tables.

Introduction
ML-Unfolding
Reweighting: (b)OmniFold
Bayesian network
Mapping distributions: Schrödinger Bridge and Direct Diffusion
Schrödinger Bridge
Direct Diffusion
Unpaired DiDi
Bayesian network
Generative unfolding: cINN, Transfermer, CFM, TraCFM, Latent Diffusion
Conditional INN
Transformer-cINN
Conditional Flow Matching
Transformer-CFM
Bayesian generative network
...and 16 more sections

Figures (12)

Figure 1: TraCFM architecture, combining the CFM generator with a Transformer encoder-decoder combination to improve combinatorics.
Figure 2: Subjet distributions for the $Z+$jets dataset, at the particle level, and at the reco level.
Figure 3: Unfolded distributions from event reweighting using OmniFold and bOmniFold. The bOmniFold error bar is based on drawing 20 Bayesian samples. For OmniFold the error bar represents the bin-wise statistical uncertainty.
Figure 4: BCE losses during training for 500 epochs for Omnifold (green) and bOmnifold (red), for Herwig-to-Pythia reweighting.
Figure 5: Weight distribution (clipped at 200) in the training set for Herwig-to-Pythia reweighting: OmniFold (left) vs bOmniFold (right). For each network we histogram the weights for the Herwig and Pythia data points.
...and 7 more figures

The Landscape of Unfolding with Machine Learning

TL;DR

Abstract

The Landscape of Unfolding with Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (12)