Table of Contents
Fetching ...

Kernel Estimation in High-Energy Physics

Kyle S. Cranmer

TL;DR

This work surveys kernel estimation as an unbinned, non-parametric density-estimation framework tailored for high-energy physics. It covers univariate and multivariate theory, including fixed and adaptive bandwidths, boundary handling, covariance considerations, and event weighting, then demonstrates broad applications from confidence level calculations to discriminant analysis and cut optimization. The paper also catalogs available software packages (KEYS, HEPUKeys, PDE, RootPDE, WinPDE) and contrasts kernel methods with SMOOTH, addressing systematic errors and practical adoption. Together, these insights provide a practical, theory-grounded toolkit for more flexible density estimation in HEP analyses and emphasize reduced binning artifacts and better handling of boundaries and heterogeneous data. The methodological emphasis and packaging guidance aim to accelerate adoption of kernel-estimation techniques in diverse physics analyses with improved accuracy and interpretability.

Abstract

Kernel Estimation provides an unbinned and non-parametric estimate of the probability density function from which a set of data is drawn. In the first section, after a brief discussion on parametric and non-parametric methods, the theory of Kernel Estimation is developed for univariate and multivariate settings. The second section discusses some of the applications of Kernel Estimation to high-energy physics. The third section provides an overview of the available univariate and multivariate packages. This paper concludes with a discussion of the inherent advantages of kernel estimation techniques and systematic errors associated with the estimation of parent distributions.

Kernel Estimation in High-Energy Physics

TL;DR

This work surveys kernel estimation as an unbinned, non-parametric density-estimation framework tailored for high-energy physics. It covers univariate and multivariate theory, including fixed and adaptive bandwidths, boundary handling, covariance considerations, and event weighting, then demonstrates broad applications from confidence level calculations to discriminant analysis and cut optimization. The paper also catalogs available software packages (KEYS, HEPUKeys, PDE, RootPDE, WinPDE) and contrasts kernel methods with SMOOTH, addressing systematic errors and practical adoption. Together, these insights provide a practical, theory-grounded toolkit for more flexible density estimation in HEP analyses and emphasize reduced binning artifacts and better handling of boundaries and heterogeneous data. The methodological emphasis and packaging guidance aim to accelerate adoption of kernel-estimation techniques in diverse physics analyses with improved accuracy and interpretability.

Abstract

Kernel Estimation provides an unbinned and non-parametric estimate of the probability density function from which a set of data is drawn. In the first section, after a brief discussion on parametric and non-parametric methods, the theory of Kernel Estimation is developed for univariate and multivariate settings. The second section discusses some of the applications of Kernel Estimation to high-energy physics. The third section provides an overview of the available univariate and multivariate packages. This paper concludes with a discussion of the inherent advantages of kernel estimation techniques and systematic errors associated with the estimation of parent distributions.

Paper Structure

This paper contains 34 sections, 20 equations, 2 figures.

Figures (2)

  • Figure 1: The performance of boundary kernels on a Neural Network distribution with a hard boundary
  • Figure 2: The standard output of the KEYS script. The top left plot shows the cumulative distributions of the KEYS shape and the data. The top right plot shows the difference between the two cumulative distributions, the maximum of which is used in the calculation of the Kolmogorov-Smirnov test. The bottom plot shows the shape produced by KEYS overlayed on a histogram of the original data.