Table of Contents
Fetching ...

On Focusing Statistical Power for Searches and Measurements in Particle Physics

James Carzon, Aishik Ghosh, Rafael Izbicki, Ann Lee, Luca Masserano, Daniel Whiteson

TL;DR

The paper tackles the non-optimal power allocation of the generalized likelihood ratio test (LRT) for composite hypotheses in particle physics. It introduces a focused test statistic (FTS), $T_f(D; \mu_0) = -2 \log\left( p(D|\mu_0) / \int p(D|\mu) f(\mu) d\mu \right)$, where the Gaussian focus function $f(\mu)$ concentrates power in physics-motivated regions while preserving valid confidence intervals via a Neyman construction. Confidence intervals are built efficiently using ML-enhanced quantile regression to estimate critical values, enabling fast, nonasymptotic interval construction even in small-sample or high-dimensional settings. The authors demonstrate substantial gains in two case studies—a Higgs boson coupling measurement and a LZ-inspired WIMP search—achieving median CI length reductions of roughly 13–21% (Higgs) and 22–35% (WIMPs) at common confidence levels. The approach yields tighter bounds in no-signal scenarios and maintains gains when a signal is present, offering a practical, drop-in improvement for a wide range of collider, neutrino, and dark-mmatter analyses, including high-dimensional or unbinned cases.

Abstract

Particle physics experiments rely on the (generalised) likelihood ratio test (LRT) for searches and measurements, which consist of composite hypothesis tests. However, this test is not guaranteed to be optimal, as the Neyman-Pearson lemma pertains only to simple hypothesis tests. Any choice of test statistic thus implicitly determines how statistical power varies across the parameter space. An improvement in the core statistical testing methodology for general settings with composite tests would have widespread ramifications across experiments. We discuss an alternate test statistic that provides the data analyzer an ability to focus the power of the test on physics-motivated regions of the parameter space. We demonstrate the improvement from this technique compared to the LRT on a Higgs $\rightarrowττ$ dataset simulated by the ATLAS experiment and a dark matter dataset inspired by the LZ experiment. We also employ machine learning to efficiently perform the Neyman construction, which is essential to ensure statistically valid confidence intervals.

On Focusing Statistical Power for Searches and Measurements in Particle Physics

TL;DR

The paper tackles the non-optimal power allocation of the generalized likelihood ratio test (LRT) for composite hypotheses in particle physics. It introduces a focused test statistic (FTS), , where the Gaussian focus function concentrates power in physics-motivated regions while preserving valid confidence intervals via a Neyman construction. Confidence intervals are built efficiently using ML-enhanced quantile regression to estimate critical values, enabling fast, nonasymptotic interval construction even in small-sample or high-dimensional settings. The authors demonstrate substantial gains in two case studies—a Higgs boson coupling measurement and a LZ-inspired WIMP search—achieving median CI length reductions of roughly 13–21% (Higgs) and 22–35% (WIMPs) at common confidence levels. The approach yields tighter bounds in no-signal scenarios and maintains gains when a signal is present, offering a practical, drop-in improvement for a wide range of collider, neutrino, and dark-mmatter analyses, including high-dimensional or unbinned cases.

Abstract

Particle physics experiments rely on the (generalised) likelihood ratio test (LRT) for searches and measurements, which consist of composite hypothesis tests. However, this test is not guaranteed to be optimal, as the Neyman-Pearson lemma pertains only to simple hypothesis tests. Any choice of test statistic thus implicitly determines how statistical power varies across the parameter space. An improvement in the core statistical testing methodology for general settings with composite tests would have widespread ramifications across experiments. We discuss an alternate test statistic that provides the data analyzer an ability to focus the power of the test on physics-motivated regions of the parameter space. We demonstrate the improvement from this technique compared to the LRT on a Higgs dataset simulated by the ATLAS experiment and a dark matter dataset inspired by the LZ experiment. We also employ machine learning to efficiently perform the Neyman construction, which is essential to ensure statistically valid confidence intervals.

Paper Structure

This paper contains 18 sections, 6 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Construction of confidence intervals. (Top) To construct a $(1-\alpha)100\%$ confidence interval using the LRS statistic (yellow; Eq. \ref{['eq:LRS']}), we estimate the critical values $C_{\mu_0}$ (grey) via quantile regression and retain those $\mu_0$ values for which the statistic falls below the critical value. Shown are the critical values for $68\%$ (dark grey) and $95\%$ (light grey) confidence levels, with corresponding intervals displayed below the figure (dark and light yellow, respectively). (Bottom) The FTS statistic (blue; Eq. \ref{['eq:FTS']}) for a focus function centered at $m=1$ and with width $s=1.2$ (dashed red) is compared against the critical values. Intervals at $68\%$ and $95\%$ confidence level are shown (dark and light blue, respectively). This figure corresponds to the Higgs measurement example using the Higgs mass observable, with the test statistics are evaluated on an Asimov data set Cowan:2010js
  • Figure 2: Lengths of confidence intervals. (a, left) Results for the Higgs measurement with wide focus ($s=2.4$). The thick lines represent the median interval length with the uncertainty bands denoting 25% and 75% quantiles of the length distribution. FTS (blue) yields 13% shorter intervals than LRS (orange) near the focus center ($m = 1$). (a, right) For a narrow focus ($s=1.2$), FTS intervals are about 25% shorter. That advantage is maintained even at a modest distance from the center, e.g. near $\mu^*=2$. (b, left-right) For the LZ-inspired search, the intervals are shorter across the domain of our search. Near $\mu^*=0$, the FTS intervals are about 35% shorter with wide focus than LRS, and 22% shorter with narrow focus.
  • Figure 3: Critical values. (a) We compare the mean-squared error (MSE) in the critical value estimates obtained via MC (red) and quantile regression (QR, grey) for LRS on a grid of $300$ evaluation points. For a low simulation budget of $9{,}000$ pseudo-experiments (PEs), MC estimates show high MSE whereas QR estimates (using the same number of PEs) are comparable to MC estimates with $1.35$ million PEs. The right column confirms that low-budget QR yields accurate estimates across the parameter space, unlike low-budget MC. (b) For FTS, QR is again more efficient. With $9{,}000$ PEs, QR matches the performance of MC with about $30{,}000$ PEs.
  • Figure 4: Template 2D histograms for the LZ-inspired dataset, showing the signal (nuclear recoil, NR; left) and background (electronic recoil, ER; right). Each histogram has 10 bins along both axes.
  • Figure 5: Distributions of the upper and the lower bounds of 68% confidence intervals for the LZ-inspired study, shown as box plots --- the median values are represented by circles (or stars) with the 25th and 75th percentiles of each distribution connected with a line. LRS (orange) is compared to FTS with wide focus (blue) over pseudo-experiments across different values of $\mu^*$ (x-axis). Both the median upper bound and the median lower bound are closer to the bisector for FTS than LRS, which is consistent with tighter parameter constraints.