Response Matrix Estimation in Unfolding Differential Cross Sections
Huanbiao Zhu, Andrea Carlo Marini, Mikael Kuusela, Larry Wasserman
TL;DR
This paper tackles the ill-posed unfolding problem in high-energy physics by proposing to estimate the detector response kernel $k(y|x)$ directly in the unbinned space using conditional density estimation, and then construct the discretized forward operator $\boldsymbol{K}$ via a plug-in approach. It systematically compares several nonparametric CDE methods—kernel regression, local linear, local kernel, and a location-scale model—against the traditional histogram-based estimator, showing that CDE-based estimators typically yield smoother, more accurate response matrices and, consequently, more reliable unfolded spectra. A notable finding is the implicit regularization effect observed when using a noisy histogram-based $\boldsymbol{K}$ in unregularized unfoldings, contrasted with the advantages of explicit regularization when using better estimated $\boldsymbol{K}$. The approach is validated on inclusive jet $p_T$ simulations and applied to simulated Drell–Yan plus jets data, indicating practical benefits for unfolding in real LHC analyses, while highlighting open issues in bandwidth selection and uncertainty quantification for the forward-matrix estimation.
Abstract
The unfolding problem in particle physics is to make inferences about the true particle spectrum based on smeared observations from a detector. This is an ill-posed inverse problem, where small changes in the smeared distribution can lead to large fluctuations in the unfolded distribution. The forward operator is the response matrix which models the detector response. In practice, the forward operator is rarely known analytically and is instead estimated using Monte Carlo simulation. This raises the question of how to best estimate the response matrix and what impact this estimation has on the unfolded solutions. In most analyses at the LHC, response matrix estimation is done by binning the true and smeared events and counting the propagation of events between the bins. However, this approach can result in a noisy estimate of the response matrix, especially with a small Monte Carlo sample size. Unexpectedly, we also find that the noise in the estimated response matrix can inadvertently regularize the problem. As an alternative, we propose to estimate the response matrix through the use of conditional density estimation of the response kernel in the unbinned setting followed by binning this estimator. Using simulation studies, we investigate the performance of the two approaches.
