Hypothesis tests and model parameter estimation on data sets with missing correlation information
Lukas Koch
TL;DR
This paper addresses statistical analyses when full inter-point covariance information is unavailable, proposing robust simple-hypothesis tests (fitted, p_min, and f_max variants) and a derating strategy to inflate parameter uncertainties under unknown correlations. The fitted statistic minimizes the Mahalanobis distance over feasible off-diagonal blocks and leads to a conservative Cee-squared distribution for p-values; an algorithmic whitening-based approach yields worst-case derating factors to preserve coverage up to a chosen level (e.g., $\gamma=0.997$). The methods are demonstrated on neutrino interaction data (neutrino tune comparisons, cross-section tests) and extended to Goodness of Fit and composite hypotheses. The work emphasizes practical guidance for combining results with partial correlation information and provides software implementations in NuStatTools.
Abstract
Ideally, all analyses of normally distributed data should include the full covariance information between all data points. In practice, the full covariance matrix between all data points is not always available. Either because a result was published without a covariance matrix, or because one tries to combine multiple results from separate publications. For simple hypothesis tests, it is possible to define robust test statistics that will behave conservatively in the presence on unknown correlations. For model parameter fits, one can inflate the variance by a factor to ensure that things remain conservative at least up to a chosen confidence level. This paper describes a class of robust test statistics for simple hypothesis tests, as well as an algorithm to determine the necessary inflation factor for model parameter fits and Goodness of Fit tests and composite hypothesis tests. It then presents some example applications of the methods to real neutrino interaction data and model comparisons.
