Table of Contents
Fetching ...

The Landscape of Unfolding with Machine Learning

Nathan Huetsch, Javier Mariño Villadamigo, Alexander Shmakov, Sascha Diefenbacher, Vinicius Mikuni, Theo Heimel, Michael Fenton, Kevin Greif, Benjamin Nachman, Daniel Whiteson, Anja Butter, Tilman Plehn

TL;DR

The paper tackles the challenge of unfolding detector effects and translating observations to parton-level information in high-energy physics by surveying three ML-based families: reweighting (OmniFold and Bayesian variants), distribution mapping (Schrödinger Bridge and Direct Diffusion), and conditional generative unfolding (cINN, Transfermer, CFM, TraCFM, Latent Diffusion). By benchmarking these methods on identical datasets, the authors demonstrate that each approach can reproduce particle- and parton-level distributions with percent-level accuracy across complex observables, while offering complementary strengths and uncertainty quantification. The study shows practical viability for unbinned, high-dimensional cross-section measurements, enabling broader community access and potential sensitivity to new phenomena, with concrete extensions to $Z$+jets detector unfolding and top-quark pair production. The results suggest a versatile ML toolkit for future SM tests and global analyses, combining model-agnostic reweighting, distribution-mapping, and physics-informed generative modeling. These advances have significant practical impact by reducing reliance on expensive forward simulations and enabling precise, multi-dimensional unfolding in contemporary collider data analysis.

Abstract

Recent innovations from machine learning allow for data unfolding, without binning and including correlations across many dimensions. We describe a set of known, upgraded, and new methods for ML-based unfolding. The performance of these approaches are evaluated on the same two datasets. We find that all techniques are capable of accurately reproducing the particle-level spectra across complex observables. Given that these approaches are conceptually diverse, they offer an exciting toolkit for a new class of measurements that can probe the Standard Model with an unprecedented level of detail and may enable sensitivity to new phenomena.

The Landscape of Unfolding with Machine Learning

TL;DR

The paper tackles the challenge of unfolding detector effects and translating observations to parton-level information in high-energy physics by surveying three ML-based families: reweighting (OmniFold and Bayesian variants), distribution mapping (Schrödinger Bridge and Direct Diffusion), and conditional generative unfolding (cINN, Transfermer, CFM, TraCFM, Latent Diffusion). By benchmarking these methods on identical datasets, the authors demonstrate that each approach can reproduce particle- and parton-level distributions with percent-level accuracy across complex observables, while offering complementary strengths and uncertainty quantification. The study shows practical viability for unbinned, high-dimensional cross-section measurements, enabling broader community access and potential sensitivity to new phenomena, with concrete extensions to +jets detector unfolding and top-quark pair production. The results suggest a versatile ML toolkit for future SM tests and global analyses, combining model-agnostic reweighting, distribution-mapping, and physics-informed generative modeling. These advances have significant practical impact by reducing reliance on expensive forward simulations and enabling precise, multi-dimensional unfolding in contemporary collider data analysis.

Abstract

Recent innovations from machine learning allow for data unfolding, without binning and including correlations across many dimensions. We describe a set of known, upgraded, and new methods for ML-based unfolding. The performance of these approaches are evaluated on the same two datasets. We find that all techniques are capable of accurately reproducing the particle-level spectra across complex observables. Given that these approaches are conceptually diverse, they offer an exciting toolkit for a new class of measurements that can probe the Standard Model with an unprecedented level of detail and may enable sensitivity to new phenomena.
Paper Structure (31 sections, 53 equations, 12 figures, 5 tables)

This paper contains 31 sections, 53 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: TraCFM architecture, combining the CFM generator with a Transformer encoder-decoder combination to improve combinatorics.
  • Figure 2: Subjet distributions for the $Z+$jets dataset, at the particle level, and at the reco level.
  • Figure 3: Unfolded distributions from event reweighting using OmniFold and bOmniFold. The bOmniFold error bar is based on drawing 20 Bayesian samples. For OmniFold the error bar represents the bin-wise statistical uncertainty.
  • Figure 4: BCE losses during training for 500 epochs for Omnifold (green) and bOmnifold (red), for Herwig-to-Pythia reweighting.
  • Figure 5: Weight distribution (clipped at 200) in the training set for Herwig-to-Pythia reweighting: OmniFold (left) vs bOmniFold (right). For each network we histogram the weights for the Herwig and Pythia data points.
  • ...and 7 more figures