Table of Contents
Fetching ...

Response Matrix Estimation in Unfolding Differential Cross Sections

Huanbiao Zhu, Andrea Carlo Marini, Mikael Kuusela, Larry Wasserman

TL;DR

This paper tackles the ill-posed unfolding problem in high-energy physics by proposing to estimate the detector response kernel $k(y|x)$ directly in the unbinned space using conditional density estimation, and then construct the discretized forward operator $\boldsymbol{K}$ via a plug-in approach. It systematically compares several nonparametric CDE methods—kernel regression, local linear, local kernel, and a location-scale model—against the traditional histogram-based estimator, showing that CDE-based estimators typically yield smoother, more accurate response matrices and, consequently, more reliable unfolded spectra. A notable finding is the implicit regularization effect observed when using a noisy histogram-based $\boldsymbol{K}$ in unregularized unfoldings, contrasted with the advantages of explicit regularization when using better estimated $\boldsymbol{K}$. The approach is validated on inclusive jet $p_T$ simulations and applied to simulated Drell–Yan plus jets data, indicating practical benefits for unfolding in real LHC analyses, while highlighting open issues in bandwidth selection and uncertainty quantification for the forward-matrix estimation.

Abstract

The unfolding problem in particle physics is to make inferences about the true particle spectrum based on smeared observations from a detector. This is an ill-posed inverse problem, where small changes in the smeared distribution can lead to large fluctuations in the unfolded distribution. The forward operator is the response matrix which models the detector response. In practice, the forward operator is rarely known analytically and is instead estimated using Monte Carlo simulation. This raises the question of how to best estimate the response matrix and what impact this estimation has on the unfolded solutions. In most analyses at the LHC, response matrix estimation is done by binning the true and smeared events and counting the propagation of events between the bins. However, this approach can result in a noisy estimate of the response matrix, especially with a small Monte Carlo sample size. Unexpectedly, we also find that the noise in the estimated response matrix can inadvertently regularize the problem. As an alternative, we propose to estimate the response matrix through the use of conditional density estimation of the response kernel in the unbinned setting followed by binning this estimator. Using simulation studies, we investigate the performance of the two approaches.

Response Matrix Estimation in Unfolding Differential Cross Sections

TL;DR

This paper tackles the ill-posed unfolding problem in high-energy physics by proposing to estimate the detector response kernel directly in the unbinned space using conditional density estimation, and then construct the discretized forward operator via a plug-in approach. It systematically compares several nonparametric CDE methods—kernel regression, local linear, local kernel, and a location-scale model—against the traditional histogram-based estimator, showing that CDE-based estimators typically yield smoother, more accurate response matrices and, consequently, more reliable unfolded spectra. A notable finding is the implicit regularization effect observed when using a noisy histogram-based in unregularized unfoldings, contrasted with the advantages of explicit regularization when using better estimated . The approach is validated on inclusive jet simulations and applied to simulated Drell–Yan plus jets data, indicating practical benefits for unfolding in real LHC analyses, while highlighting open issues in bandwidth selection and uncertainty quantification for the forward-matrix estimation.

Abstract

The unfolding problem in particle physics is to make inferences about the true particle spectrum based on smeared observations from a detector. This is an ill-posed inverse problem, where small changes in the smeared distribution can lead to large fluctuations in the unfolded distribution. The forward operator is the response matrix which models the detector response. In practice, the forward operator is rarely known analytically and is instead estimated using Monte Carlo simulation. This raises the question of how to best estimate the response matrix and what impact this estimation has on the unfolded solutions. In most analyses at the LHC, response matrix estimation is done by binning the true and smeared events and counting the propagation of events between the bins. However, this approach can result in a noisy estimate of the response matrix, especially with a small Monte Carlo sample size. Unexpectedly, we also find that the noise in the estimated response matrix can inadvertently regularize the problem. As an alternative, we propose to estimate the response matrix through the use of conditional density estimation of the response kernel in the unbinned setting followed by binning this estimator. Using simulation studies, we investigate the performance of the two approaches.

Paper Structure

This paper contains 23 sections, 33 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: Unfolding is the process of inferring the true distribution $f$ from the smeared distribution $g$.
  • Figure 2: Estimated $40 \times 40$ response matrices using the different methods with one Monte Carlo simulation. The sample size of the Monte Carlo simulation is 100,000. The bottom-right heatmap is the true response matrix. The other heatmaps are the estimated response matrices with the different methods.
  • Figure 3: Bin-wise mean absolute errors for estimating the $40 \times 40$ response matrix using the different methods with 1,000 Monte Carlo simulations. The sample size of each Monte Carlo simulation is 100,000.
  • Figure 4: Tikhonov solutions with regularization parameters (a) $\delta=10^{-9}$ and (b) $\delta=10^{-10}$.
  • Figure 5: Unregularized least-squares solution ($\delta=0$).
  • ...and 6 more figures