Table of Contents
Fetching ...

Bayesian Model Comparison and Significance: Widespread Errors and how to Correct Them

Daniel P. Thorngren, David K. Sing, Sagnick Mukherjee

TL;DR

The paper exposes the widespread misuse of the inverse-Sellke transformation to convert Bayes factors into significances in exoplanet atmosphere studies, showing that this approach overstates confidence. It advocates adopting the standard Bayesian interpretation of Bayes factors as odds or posterior model probabilities and recommends supplementing with information criteria such as AIC or BPICS, especially given prior-sensitivity concerns. Through a WASP-39 b case study, it demonstrates that the inverse-Sellke method can exaggerate significance (e.g., $n_ ext{sigma}^*=3.73$ vs $p( ext{B}|y)=0.0044$ for SO$_2$), while multiple posteriors-based criteria provide a more nuanced assessment. The authors propose practical guidelines: report Bayes factors with proper priors, use BPICS/AIC (and WAIC/DIC when appropriate), and emphasize robust prior choices and transparent data handling to improve reproducibility and observational planning in exoplanet spectroscopy.

Abstract

Bayes factors have become a popular tool in exoplanet spectroscopy for testing atmosphere models against one another. We show that the commonly used method for converting these values into significance "sigmas" is invalid. The formula is neither justified nor recommended by its original paper, and overestimates the confidence of results. We use simple examples to demonstrate the invalidity and prior sensitivity of this approach. We review the standard Bayesian interpretation of the Bayes factor as an odds ratio and recommend its use in conjunction with the Akaike Information Criterion (AIC) or Bayesian Predictive Information Criterion Simplified (BPICS) in future analyses (Python implementations are included) . As a concrete example, we refit the WASP-39 b NIRSpec transmission spectrum to test for the presence of SO$_2$. The prevalent, incorrect significance calculation gives $3.67σ$ whereas the standard Bayesian interpretation yields a null model probability $p(\mathcal{B}|y)=0.0044$. Surveying the exoplanet atmosphere literature, we find widespread use of the erroneous formula. In order to avoid overstating observational results and estimating observation times too low, the community should return to the standard Bayesian interpretation.

Bayesian Model Comparison and Significance: Widespread Errors and how to Correct Them

TL;DR

The paper exposes the widespread misuse of the inverse-Sellke transformation to convert Bayes factors into significances in exoplanet atmosphere studies, showing that this approach overstates confidence. It advocates adopting the standard Bayesian interpretation of Bayes factors as odds or posterior model probabilities and recommends supplementing with information criteria such as AIC or BPICS, especially given prior-sensitivity concerns. Through a WASP-39 b case study, it demonstrates that the inverse-Sellke method can exaggerate significance (e.g., vs for SO), while multiple posteriors-based criteria provide a more nuanced assessment. The authors propose practical guidelines: report Bayes factors with proper priors, use BPICS/AIC (and WAIC/DIC when appropriate), and emphasize robust prior choices and transparent data handling to improve reproducibility and observational planning in exoplanet spectroscopy.

Abstract

Bayes factors have become a popular tool in exoplanet spectroscopy for testing atmosphere models against one another. We show that the commonly used method for converting these values into significance "sigmas" is invalid. The formula is neither justified nor recommended by its original paper, and overestimates the confidence of results. We use simple examples to demonstrate the invalidity and prior sensitivity of this approach. We review the standard Bayesian interpretation of the Bayes factor as an odds ratio and recommend its use in conjunction with the Akaike Information Criterion (AIC) or Bayesian Predictive Information Criterion Simplified (BPICS) in future analyses (Python implementations are included) . As a concrete example, we refit the WASP-39 b NIRSpec transmission spectrum to test for the presence of SO. The prevalent, incorrect significance calculation gives whereas the standard Bayesian interpretation yields a null model probability . Surveying the exoplanet atmosphere literature, we find widespread use of the erroneous formula. In order to avoid overstating observational results and estimating observation times too low, the community should return to the standard Bayesian interpretation.

Paper Structure

This paper contains 19 sections, 20 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: The data used in our example model comparison in black, along with the null (Eq. \ref{['eq:nullModel']}) and alternate (Eq. \ref{['eq:alternateModel']}) model posterior predictives (median and 16th to 84th percentile ranges). The data were created from Eq. \ref{['eq:exampleFunction']} with $m=0.15$, so it was expected that the alternate model is preferred.
  • Figure 2: A comparison of the sigma associated with incorrect interpretation of Sellke2001 (right axis) with the model probability $p(\mathcal{B}|y)$ under the standard Bayesian interpretation (left axis) for a given Bayes factor $B$. The erroneous procedure produces "significant" values of sigma even for rather low odds ratios -- e.g. $3\sigma$ at 23:1 odds (probability $p(\mathcal{B}|y)=.042$) and $2\sigma$ at 2.6:1 odds ($p(\mathcal{B}|y)=.28$).
  • Figure 3: The WASP-39 b transmission spectrum captured by JWST NIRSpec and reduced in Rustamkulov2023, with our model fits overplotted as the medians and $1\sigma$ contours. Two models are given, one with sulfur dioxide and one without. The SO$_2$ model is moderately favored by both the BIC, DIC, and Bayes factor, but does not exceed $3\sigma$. Inverting Eq. \ref{['eq:sellke']} erroneously gives a significance above $3\sigma$.
  • Figure 4: The posterior of the atmosphere model without SO$_2$ fitted to the WASP-39 b transit spectrum, as described in Sec. \ref{['sec:retrieval']}. Selected model comparison statistics are listed as well.
  • Figure 5: The posterior of the atmosphere model including SO$_2$ fitted to the WASP-39 b transit spectrum, as described in Sec. \ref{['sec:retrieval']}. Selected model comparison statistics are listed as well.