Table of Contents
Fetching ...

On hypothesis testing, trials factor, hypertests and the BumpHunter

Georgios Choudalakis

TL;DR

The paper tackles the look-elsewhere effect in hypothesis testing by formalizing hypothesis hypertests and introducing BumpHunter, a practical method that searches for local excesses (bumps) across a spectrum with varying window sizes. It defines a robust p-value framework where the global test statistic is $t = -\log(p\text{-value}^{\min})$, ensuring that the resulting Type I error is properly controlled despite the multiple testing across locations and widths. Through the Banff Challenge and related sensitivity studies, it demonstrates how BumpHunter can detect bumps without assuming a specific signal shape or position, while also comparing its performance to targeted likelihood-based tests and exploring generalizations like TailHunter and multi-spectrum aggregation. The work highlights the balance between broad, model-independent searches and the associated efficiency cost from the trials factor, offering practical guidance for implementing hypertests in high energy physics analyses.

Abstract

A detailed presentation of hypothesis testing is given. The "look elsewhere" effect is illustrated, and a treatment of the trials factor is proposed with the introduction of hypothesis hypertests. An example of such a hypertest is presented, named BumpHunter, which is used in the recent ATLAS dijet resonance search, and in an earlier version in the CDF Global Search, to look for exotic phenomena in high energy physics. As a demonstration, the BumpHunter is used to address Problem 1 of the Banff Challenge.

On hypothesis testing, trials factor, hypertests and the BumpHunter

TL;DR

The paper tackles the look-elsewhere effect in hypothesis testing by formalizing hypothesis hypertests and introducing BumpHunter, a practical method that searches for local excesses (bumps) across a spectrum with varying window sizes. It defines a robust p-value framework where the global test statistic is , ensuring that the resulting Type I error is properly controlled despite the multiple testing across locations and widths. Through the Banff Challenge and related sensitivity studies, it demonstrates how BumpHunter can detect bumps without assuming a specific signal shape or position, while also comparing its performance to targeted likelihood-based tests and exploring generalizations like TailHunter and multi-spectrum aggregation. The work highlights the balance between broad, model-independent searches and the associated efficiency cost from the trials factor, offering practical guidance for implementing hypertests in high energy physics analyses.

Abstract

A detailed presentation of hypothesis testing is given. The "look elsewhere" effect is illustrated, and a treatment of the trials factor is proposed with the introduction of hypothesis hypertests. An example of such a hypertest is presented, named BumpHunter, which is used in the recent ATLAS dijet resonance search, and in an earlier version in the CDF Global Search, to look for exotic phenomena in high energy physics. As a demonstration, the BumpHunter is used to address Problem 1 of the Banff Challenge.

Paper Structure

This paper contains 27 sections, 15 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: Fitting an exponential spectrum with a Gaussian signal, like in BanffChallenge. In one case the whole spectrum is fitted, and in the other the algorithm described in paragraph \ref{['sec:omitAnomaly']} locates the anomalous region and fits the rest of the spectrum.
  • Figure 2: \ref{['fig:figures/dataset_10/dataAndFitAndDifference.eps']}: The data of dataset 10, with the result of fitting eq. \ref{['eq:BanffBkg']} as described in paragraph \ref{['sec:omitAnomaly']}. The bottom of the figure compares the data ($D$) to the background ($B$) in each bin, using the $\frac{D-B}{\sqrt{B}}$ approximation of significance. The blue vertical lines show the most discrepant bump found, namely the central window of the local hypothesis test which yielded the smallest $p{\hbox{-value}}$. \ref{['fig:figures/dataset_10/fittedWithGaussian.eps']}: The fit of eq. \ref{['eq:bkgAndSig']} to the data. \ref{['fig:figures/dataset_10/nullStatistic.eps']}: The distribution of the BumpHunter statistic in 690 pseudo-experiments ($t$) generated to follow the distribution obtained by the fit in \ref{['fig:figures/dataset_10/dataAndFitAndDifference.eps']}. The observed BumpHunter statistic ($t_o$) is marked by the blue arrow. \ref{['fig:figures/dataset_10/contourDE.eps']}: The 2-dimensional 0.5$\sigma$ (red), 1$\sigma$ (black), and 2$\sigma$ (blue) confidence contour for the signal position and amount. The black marker and the error bars correspond to the most likely values and the uncertainty returned by TF1::GetParError. \ref{['fig:figures/dataset_10/contourAE.eps']}: Same as \ref{['fig:figures/dataset_10/contourDE.eps']}, but showing the signal position and slope parameter $A$.
  • Figure 3: Same as Fig. \ref{['fig:exampleDiscovery']}, but for dataset 0, where most likely there is no signal. An obvious difference is that only 10 pseudo-experiments as generated, 9 of which have a bigger BumpHunter statistic than observed, as shown in \ref{['fig:figures/dataset_0/nullStatistic.eps']}.
  • Figure 4: Summary of the results from 5 Banff Challenge Problem 1 datasets, where no discovery was claimed. The datasets are {100, 400, 500, 700, 800}, and one row of figures corresponds to each respectively. We see from the 2-dimensional contours that parameters $D$ and $E$ are poorly constrained, because there is not significant signal in the data to constrain them. The corresponding most likely $p{\hbox{-values}}$ are: $\{\frac{8}{10}, \frac{4}{60}, \frac{5}{10}, \frac{8}{10}, \frac{3}{10} \}$
  • Figure 5: Summary of results from 5 Banff Challenge Problem 1 datasets, where a discovery was claimed. The datasets are {22, 25, 35, 41, 42}. One row of figures corresponds to each. In the 3$^{\rm rd}$ row, 3$^{\rm rd}$ column, the blue arrow is missing because the observed BumpHunter statistic $t_o = 24.9$ is outside the plotted range. The same happens in dataset 42, last row, with $t_o = 14.9$. The corresponding most likely $p{\hbox{-values}}$ are: $\{\frac{11}{2250}, \frac{7}{1960}, \frac{0}{690}, \frac{266}{31080}, \frac{0}{690} \}$.
  • ...and 9 more figures