Table of Contents
Fetching ...

Unbinned Inference with Correlated Events

Krish Desai, Owen Long, Benjamin Nachman

TL;DR

The paper tackles the problem that event correlations induced by unfolding invalidate standard unbinned inference assumptions. It uses OmniFold to perform unbinned unfolding and then conducts parameter inference on Gaussian toy models across 1D to6D, comparing unbinned ML and binned $\chi^2$ approaches with full or diagonal covariances. It shows that ignoring unfolding-induced correlations can significantly underestimate uncertainties, while numerical approaches (e.g., bootstrap) yield valid coverage; in higher dimensions, the RMS uncertainty systematically exceeds asymptotic estimates by about 18–28%. The work provides practical guidance for analyzing unbinned unfolded data, arguing against relying on asymptotic formulas until a proper correlated formalism is developed, and it offers code and concrete recommendations for covariance-aware inference.

Abstract

Modern machine learning has enabled parameter inference from event-level data without the need to first summarize all events with a histogram. All of these unbinned inference methods make use of the fact that the events are statistically independent so that the log likelihood is a sum over events. However, this assumption is not valid for unbinned inference on unfolded data, where the deconvolution process induces a correlation between events. We explore the impact of event correlations on downstream inference tasks in the context of the OmniFold unbinned unfolding method. We find that uncertainties may be significantly underestimated when event correlations are excluded from uncertainty quantification.

Unbinned Inference with Correlated Events

TL;DR

The paper tackles the problem that event correlations induced by unfolding invalidate standard unbinned inference assumptions. It uses OmniFold to perform unbinned unfolding and then conducts parameter inference on Gaussian toy models across 1D to6D, comparing unbinned ML and binned approaches with full or diagonal covariances. It shows that ignoring unfolding-induced correlations can significantly underestimate uncertainties, while numerical approaches (e.g., bootstrap) yield valid coverage; in higher dimensions, the RMS uncertainty systematically exceeds asymptotic estimates by about 18–28%. The work provides practical guidance for analyzing unbinned unfolded data, arguing against relying on asymptotic formulas until a proper correlated formalism is developed, and it offers code and concrete recommendations for covariance-aware inference.

Abstract

Modern machine learning has enabled parameter inference from event-level data without the need to first summarize all events with a histogram. All of these unbinned inference methods make use of the fact that the events are statistically independent so that the log likelihood is a sum over events. However, this assumption is not valid for unbinned inference on unfolded data, where the deconvolution process induces a correlation between events. We explore the impact of event correlations on downstream inference tasks in the context of the OmniFold unbinned unfolding method. We find that uncertainties may be significantly underestimated when event correlations are excluded from uncertainty quantification.

Paper Structure

This paper contains 19 sections, 9 equations, 13 figures.

Figures (13)

  • Figure 1: Histograms showing the datasets from the one-dimensional study. The distributions on the left show the true distribution of the observable compared with the gen-particle distribution of the Monte Carlo. The center histograms show a sample detector-level distribution of the observable compared with the sim-particle distribution of the Monte Carlo. The right histogram shows the resolution function. For these histograms, we used 31 bins in the range [-5, 5].
  • Figure 2: Mean asymptotic error versus detector smearing for the (a) Gaussian mean, $\mu$, and (b) the variance $\sigma^2$ obtained from the full covariance analysis (green circles) and the diagonal covariance approximation (pink squares). The green stars represent the standard deviation computed from the spread of the best-fit values over 500 bootstrap replicas. Note that the numerical uncertainty agrees well with the asymptotic error from the full covariance fit while the diagonal approximation consistently overestimates the uncertainty at larger smearing values.
  • Figure 3: The mean best-fit value of (a) $\mu$ and (b) $\sigma^2$ is shown as a function of the detector smearing, with error bars indicating the standard error of the mean (SEM) over 500 bootstrap test datasets. Horizontal dashed red lines mark the true values of $\mu=0.2$ and $\sigma^2=0.81$. Both fit methods (full covariance and diagonal approximation) yield similar central values.
  • Figure 4: Average weight correlation between two events as a function of the absolute distance between the events in the observable. The four curves show different values for the detector resolution. The top plots show unfolding with the KDE approach within OmniFold, while the bottom plots show results from using NNs within OmniFold. The error bars show the RMS of the correlation values.
  • Figure 5: Covariance matrices of a histogram of the unfolding output for four detector resolution values (0, 0.25, 0.50, 0.75). The top plots show unfolding with the KDE approach within OmniFold, while the bottom plots show results from using NNs within OmniFold. The histogram has 40 bins in the range [-4, 4].
  • ...and 8 more figures