Tests of goodness of fit to multiple data sets
J. C. Collins, J. Pumplin
TL;DR
This paper argues that relying solely on the overall chi-square is insufficient for judging goodness-of-fit in global theory-data analyses with many parameters. It develops a parameter-fitting, subset-based criterion that uses the dependence of $\chi^2$ on parameters and employs Lagrange multiplier techniques to test consistency across data subsets. Through a two-experiment toy model and the CTEQ5 parton-distribution fits, it demonstrates that substantial inconsistencies can exist even when the total $\chi^2$ looks acceptable, uncovering potential bugs in data or theory. It also introduces visualization tools—plots of $\chi^2$ versus $\chi^2_{tot}$, $\chi^2$ versus $\chi_{not\,i}^2$, and a one-parameter model—that streamline diagnosing data-theory tensions and guide robust uncertainty estimation.
Abstract
We propose a new and rather stringent criterion for testing the goodness of fit between a theory and experiment. It is motivated by the paradox that the criterion on χ^2 for testing a theory is much weaker than the criterion for finding the best fit value of a parameter in the theory. We present a method by which the stronger parameter-fitting criterion can be applied to subsets of data in a global fit.
