Table of Contents
Fetching ...

Tests of goodness of fit to multiple data sets

J. C. Collins, J. Pumplin

TL;DR

This paper argues that relying solely on the overall chi-square is insufficient for judging goodness-of-fit in global theory-data analyses with many parameters. It develops a parameter-fitting, subset-based criterion that uses the dependence of $\chi^2$ on parameters and employs Lagrange multiplier techniques to test consistency across data subsets. Through a two-experiment toy model and the CTEQ5 parton-distribution fits, it demonstrates that substantial inconsistencies can exist even when the total $\chi^2$ looks acceptable, uncovering potential bugs in data or theory. It also introduces visualization tools—plots of $\chi^2$ versus $\chi^2_{tot}$, $\chi^2$ versus $\chi_{not\,i}^2$, and a one-parameter model—that streamline diagnosing data-theory tensions and guide robust uncertainty estimation.

Abstract

We propose a new and rather stringent criterion for testing the goodness of fit between a theory and experiment. It is motivated by the paradox that the criterion on χ^2 for testing a theory is much weaker than the criterion for finding the best fit value of a parameter in the theory. We present a method by which the stronger parameter-fitting criterion can be applied to subsets of data in a global fit.

Tests of goodness of fit to multiple data sets

TL;DR

This paper argues that relying solely on the overall chi-square is insufficient for judging goodness-of-fit in global theory-data analyses with many parameters. It develops a parameter-fitting, subset-based criterion that uses the dependence of on parameters and employs Lagrange multiplier techniques to test consistency across data subsets. Through a two-experiment toy model and the CTEQ5 parton-distribution fits, it demonstrates that substantial inconsistencies can exist even when the total looks acceptable, uncovering potential bugs in data or theory. It also introduces visualization tools—plots of versus , versus , and a one-parameter model—that streamline diagnosing data-theory tensions and guide robust uncertainty estimation.

Abstract

We propose a new and rather stringent criterion for testing the goodness of fit between a theory and experiment. It is motivated by the paradox that the criterion on χ^2 for testing a theory is much weaker than the criterion for finding the best fit value of a parameter in the theory. We present a method by which the stronger parameter-fitting criterion can be applied to subsets of data in a global fit.

Paper Structure

This paper contains 11 sections, 31 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Hypothetical plot of $\chi^2$ vs. parameter $p$ in fitting of theory to data with $N=100$ points.
  • Figure 2: Typical plots of $\chi^2(p)$ when the experiment of Fig. \ref{['fig:fitting']} is repeated.
  • Figure 3: Possible results of plotting $\chi_i^2$ for subsets of data as a function of $\chi_{\rm tot}^2$.
  • Figure 4: Possible results of plotting the minimum of $\chi_i^2$ for a subset of the data as a function of $\chi^2$ for the remaining data. The diagonal dashed line at $-45^\circ$ is tangent to the curve at the point where $\chi_{\rm tot}^2$ has its minimum value.
  • Figure 5: Variation of $\chi_i^2$ with $\chi_{\rm tot}^2$ for 8 of the data sets of the CTEQ5 parton density analysis. Dots mark the points found by $\lambda = 5$ in Eq. (\ref{['eq:lag.mult']}).
  • ...and 3 more figures