Under-coverage in high-statistics counting experiments with finite MC samples
Cristina-Andreea Alexe, Joshua Bendavid, Lorenzo Bianchini, Davide Bruschini
TL;DR
The paper investigates confidence-interval coverage for a parameter of interest in high-statistics, binned counting experiments where MC-driven templates are finite. It demonstrates that standard asymptotic CI methods based on Wilks’ theorem or Hessian matrices can exhibit systematic under-coverage due to MC fluctuations and nuisance parameters, even with large data samples. By analyzing a paradigmatic toy model and then generalizing to the full MC-uncertainty framework, the authors show that biases arising from fluctuations in both the Jacobian blocks ${f b}$ and ${f A}$ can distort the profiled likelihood and undercut coverage. A practical heuristic interval and a scaling-based diagnostic are proposed to gauge and mitigate these effects, while highlighting that the correct likelihood is the full Barlow-Beeston form; in many realistic settings, asymptotic formulas may require substantial MC-sample augmentation or alternative interval constructions to ensure reliable inference.
Abstract
We consider the problem of setting confidence intervals on a parameter of interest from the maximum-likelihood fit of a physics model to a binned data set with a large number of bins, large event-counts per bin, and in the presence of systematic uncertainties modeled as nuisance parameters. We use the profile-likelihood ratio for statistical inference and focus on the case in which the model is determined from Monte Carlo simulated samples of finite size. We start by presenting a toy model in which the properties of widely used approximations of the profile-likelihood ratio in the asymptotic limit, which are commonly expected to hold in the high-statistics regime, are manifestly broken even if the numbers of events per bin in both the data and simulated samples are seemingly large enough to warrant their validity. We then move to the general setting to show how statistical uncertainties in the Monte Carlo predictions can affect the coverage of confidence intervals constructed in the asymptotic approximation always in the same direction, namely they lead to systematic under-coverage.
