Table of Contents
Fetching ...

Empirical Bayes learning from selectively reported confidence intervals

Hunter Chen, Junming Guan, Erik van Zwet, Nikolaos Ignatiadis

Abstract

We develop a statistical framework for empirical Bayes learning from selectively reported confidence intervals, and apply it to provide context for interpreting results published in MEDLINE abstracts. We use a collection of 326,060 z-scores from MEDLINE abstracts (2000-2018) as the input for an empirical Bayes analysis, with publication bias as a key methodological challenge. We address publication bias through a selective tilting approach that extends empirical Bayes confidence intervals to truncated sampling. Our framework provides coverage guarantees for functionals including posterior estimands describing idealized replications and the symmetrized posterior mean, which we justify decision-theoretically as optimal among sign-equivariant (odd) estimators.

Empirical Bayes learning from selectively reported confidence intervals

Abstract

We develop a statistical framework for empirical Bayes learning from selectively reported confidence intervals, and apply it to provide context for interpreting results published in MEDLINE abstracts. We use a collection of 326,060 z-scores from MEDLINE abstracts (2000-2018) as the input for an empirical Bayes analysis, with publication bias as a key methodological challenge. We address publication bias through a selective tilting approach that extends empirical Bayes confidence intervals to truncated sampling. Our framework provides coverage guarantees for functionals including posterior estimands describing idealized replications and the symmetrized posterior mean, which we justify decision-theoretically as optimal among sign-equivariant (odd) estimators.

Paper Structure

This paper contains 66 sections, 15 theorems, 102 equations, 10 figures, 8 tables.

Key Result

Proposition 2

Under model eq:publication_bias_model, Assumption assum:publication_bias holds if and only if there exists a constant $a \in (0,1]$ such that $\pi(\left\lvert z\right\rvert) = a$ almost everywhere on $\mathcal{S}$.

Figures (10)

  • Figure 1: Histogram of $326,060$ z-scores from abstracts (one z-score per abstract) appearing in MEDLINE (2000--2018). See Supplement \ref{['sec:medline']} for preprocessing details.
  • Figure 2: Selection process and resulting distributions: (a) publication selection ($D=1$) then analyst truncation ($\lvert Z\rvert\in\mathcal{S}$); (b) empirical absolute $z$-score distribution under the analyst’s truncation.
  • Figure 3: Schematic demonstration of selective tilting and example of a $\operatorname{Tilt}_{\mathcal{S}}\left[G\right]$ where $G = \mathrm{N}(0,2)$: (a) illustration of the mapping $\operatorname{Tilt}_{\mathcal{S}}\left[\cdot\right]$ and $\operatorname{Untilt}_{\mathcal{S}}[\cdot]$; (b) The density of $G = \mathrm{N}(0,2)$ and $\operatorname{Tilt}_{\mathcal{S}}\left[G\right]$; (c) the corresponding $\Phi(\mathcal{S}; \mu)$, note $\Phi(\mathcal{S}; 0) \neq 0$
  • Figure 4: 95% Confidence interval analyses for MEDLINE (2000-2018): Each panel presents one estimand of interest, accompanied by 95% confidence intervals under different assumptions for the SNR distribution.
  • Figure S1: 95% Confidence interval analyses for MEDLINE (2000-2018) with truncation set $\mathcal{S}_\text{half} = [2.24, \infty)$: Each panel presents one estimand of interest, accompanied by 95% confidence intervals under different assumptions for the SNR distribution.
  • ...and 5 more figures

Theorems & Definitions (34)

  • Proposition 2: A necessary and sufficient condition for Assumption \ref{['assum:publication_bias']}
  • Definition 3: Symmetrized and folded prior
  • Theorem 4
  • Theorem 7: Observational equivalence
  • Remark 8: Untilting
  • Proposition 9
  • Proposition 10: Functional equivalence
  • Theorem 11
  • Remark 12: Selective tilting for other EB approaches
  • Proposition 13: Identifiability of estimands
  • ...and 24 more