Bayesian Model Comparison and Significance: Widespread Errors and how to Correct Them
Daniel P. Thorngren, David K. Sing, Sagnick Mukherjee
TL;DR
The paper exposes the widespread misuse of the inverse-Sellke transformation to convert Bayes factors into significances in exoplanet atmosphere studies, showing that this approach overstates confidence. It advocates adopting the standard Bayesian interpretation of Bayes factors as odds or posterior model probabilities and recommends supplementing with information criteria such as AIC or BPICS, especially given prior-sensitivity concerns. Through a WASP-39 b case study, it demonstrates that the inverse-Sellke method can exaggerate significance (e.g., $n_ ext{sigma}^*=3.73$ vs $p( ext{B}|y)=0.0044$ for SO$_2$), while multiple posteriors-based criteria provide a more nuanced assessment. The authors propose practical guidelines: report Bayes factors with proper priors, use BPICS/AIC (and WAIC/DIC when appropriate), and emphasize robust prior choices and transparent data handling to improve reproducibility and observational planning in exoplanet spectroscopy.
Abstract
Bayes factors have become a popular tool in exoplanet spectroscopy for testing atmosphere models against one another. We show that the commonly used method for converting these values into significance "sigmas" is invalid. The formula is neither justified nor recommended by its original paper, and overestimates the confidence of results. We use simple examples to demonstrate the invalidity and prior sensitivity of this approach. We review the standard Bayesian interpretation of the Bayes factor as an odds ratio and recommend its use in conjunction with the Akaike Information Criterion (AIC) or Bayesian Predictive Information Criterion Simplified (BPICS) in future analyses (Python implementations are included) . As a concrete example, we refit the WASP-39 b NIRSpec transmission spectrum to test for the presence of SO$_2$. The prevalent, incorrect significance calculation gives $3.67σ$ whereas the standard Bayesian interpretation yields a null model probability $p(\mathcal{B}|y)=0.0044$. Surveying the exoplanet atmosphere literature, we find widespread use of the erroneous formula. In order to avoid overstating observational results and estimating observation times too low, the community should return to the standard Bayesian interpretation.
