Table of Contents
Fetching ...

The RooStats Project

Lorenzo Moneta, Kevin Belasco, Kyle Cranmer, Sven Kreiss, Alfio Lazzaro, Danilo Piparo, Gregory Schott, Wouter Verkerke, Matthias Wolf

TL;DR

RooStats presents a unified, interface-driven statistical toolkit built on ROOT/RooFit to address the diverse inference needs of LHC data, from simple counting to multi-parameter analyses and combinations. It introduces multiple calculators—ProfileLikelihood, Bayesian (analytical/numerical and MCMC), Neyman Construction, and Hybrid—that cover frequentist, Bayesian, and likelihood-based paradigms, all accessible through common interfaces. A central contribution is the workspace framework (RooWorkspace) that enables saving complete models and data for robust combinations and digital publishing of results. The package is designed for use by major experiments (ATLAS/CMS), accompanied by tutorials and examples, and supports future extensions and broader tooling integration (e.g., BAT).

Abstract

RooStats is a project to create advanced statistical tools required for the analysis of LHC data, with emphasis on discoveries, confidence intervals, and combined measurements. The idea is to provide the major statistical techniques as a set of C++ classes with coherent interfaces, so that can be used on arbitrary model and datasets in a common way. The classes are built on top of the RooFit package, which provides functionality for easily creating probability models, for analysis combinations and for digital publications of the results. We will present in detail the design and the implementation of the different statistical methods of RooStats. We will describe the various classes for interval estimation and for hypothesis test depending on different statistical techniques such as those based on the likelihood function, or on frequentists or bayesian statistics. These methods can be applied in complex problems, including cases with multiple parameters of interest and various nuisance parameters.

The RooStats Project

TL;DR

RooStats presents a unified, interface-driven statistical toolkit built on ROOT/RooFit to address the diverse inference needs of LHC data, from simple counting to multi-parameter analyses and combinations. It introduces multiple calculators—ProfileLikelihood, Bayesian (analytical/numerical and MCMC), Neyman Construction, and Hybrid—that cover frequentist, Bayesian, and likelihood-based paradigms, all accessible through common interfaces. A central contribution is the workspace framework (RooWorkspace) that enables saving complete models and data for robust combinations and digital publishing of results. The package is designed for use by major experiments (ATLAS/CMS), accompanied by tutorials and examples, and supports future extensions and broader tooling integration (e.g., BAT).

Abstract

RooStats is a project to create advanced statistical tools required for the analysis of LHC data, with emphasis on discoveries, confidence intervals, and combined measurements. The idea is to provide the major statistical techniques as a set of C++ classes with coherent interfaces, so that can be used on arbitrary model and datasets in a common way. The classes are built on top of the RooFit package, which provides functionality for easily creating probability models, for analysis combinations and for digital publications of the results. We will present in detail the design and the implementation of the different statistical methods of RooStats. We will describe the various classes for interval estimation and for hypothesis test depending on different statistical techniques such as those based on the likelihood function, or on frequentists or bayesian statistics. These methods can be applied in complex problems, including cases with multiple parameters of interest and various nuisance parameters.

Paper Structure

This paper contains 11 sections, 3 equations, 3 figures.

Figures (3)

  • Figure 1: A class diagram of the interfaces for hypothesis testing and confidence interval calculations. The diagram shows the classes used to return the results of these statistical tests as well.
  • Figure 2: Plot of the log profile likelihood curve ($-\log \lambda$ ) as function of the parameter of interest, S. The one $\sigma$ interval ($66.8\%$ CL) is obtained from the intersect of the $-\log \lambda$ curve with the horizontal dashed line ( $-\log \lambda = 0.5$). html:<A name="ref-likelihood_plot">html:</A> LAB: likelihood_plot
  • Figure 3: Result from the hybrid calculator, the distributions of $-2\ln Q$ in the background-only (red, on the right) and signal+background (blue, on the left) hypotheses. The black line represents the value $-2\ln Q_{obs}$ on the tested data. The shaded areas represent $1-CL_{b}$ (red) and $CL_{sb}$ (blue). html:<A name="ref-m2lnQ">html:</A> LAB: m2lnQ