A simple tool for weighted averaging of inconsistent data sets
Martino Trassinelli, Marleen Maxton
TL;DR
This work addresses the challenge of combining inconsistent measurements by replacing the standard inverse-variance weighted average with a Bayesian framework that marginalises over unknown true uncertainties. Building on Sivia and Skilling, it introduces two priors—the conservative prior and the Jeffreys' prior—yielding non-Gaussian likelihoods with heavy tails that resist outliers and underestimation of uncertainties. The approach is demonstrated on synthetic data, CODATA values for the Newtonian constant, and PDG particle properties, showing generally more robust and realistic uncertainty estimates and revealing when full posterior information must be used instead of a single weighted mean. A freely available Python tool, bayesian_average, facilitates practical adoption, offering transparent comparisons with traditional methods and visualisation of the complete posterior distributions. This method provides a simple, broadly applicable alternative for robust data fusion in contexts where interlaboratory data and outliers distort standard analyses.
Abstract
The weighted average of inconsistent data is a common and tedious problem that many scientists have encountered. The standard weighted average is not recommended for these cases, and various alternative methods have been proposed. These approaches vary in suitability depending on the nature of the data, which can make selecting the appropriate method difficult without expertise in metrology or statistics. For the analysis of simple data sets presenting inconsistencies, we discuss the method proposed by Sivia in 1996 based on Bayesian statistics. This choice has the intention of maintaining generality while minimising the number of assumptions. In this approach, the uncertainty associated with each input value is considered to be just a lower bound of the true unknown uncertainty. The resulting likelihood function is no longer Gaussian but has smoothly decreasing wings, which allows for a better treatment of scattered data and outliers. To demonstrate the robustness and the generality of the method, we apply it to a series of critical data sets: simulations, CODATA recommended values of the Newtonian gravitational constant, and some particle properties from the Particle Data Group, including the proton charge radius. A freely available Python library is also provided for a simple implementation of the proposed averaging method.
