Table of Contents
Fetching ...

Empirical Bayes for Data Integration

Paul Rognon-Vael, David Rossell

TL;DR

This work discusses the use of empirical Bayes for data integration in settings where one wishes to learn structure and one only has access to incomplete data from previous studies, such as summaries, estimates or lists of relevant features.

Abstract

We discuss the use of empirical Bayes for data integration, in the sense of transfer learning. Our main interest is in settings where one wishes to learn structure (e.g. feature selection) and one only has access to incomplete data from previous studies, such as summaries, estimates or lists of relevant features. We discuss differences between full Bayes and empirical Bayes, and develop a computational framework for the latter. We discuss how empirical Bayes attains consistent variable selection under weaker conditions (sparsity and betamin assumptions) than full Bayes and other standard criteria do, and how it attains faster convergence rates. Our high-dimensional regression examples show that fully Bayesian inference enjoys excellent properties, and that data integration with empirical Bayes can offer moderate yet meaningful improvements in practice.

Empirical Bayes for Data Integration

TL;DR

This work discusses the use of empirical Bayes for data integration in settings where one wishes to learn structure and one only has access to incomplete data from previous studies, such as summaries, estimates or lists of relevant features.

Abstract

We discuss the use of empirical Bayes for data integration, in the sense of transfer learning. Our main interest is in settings where one wishes to learn structure (e.g. feature selection) and one only has access to incomplete data from previous studies, such as summaries, estimates or lists of relevant features. We discuss differences between full Bayes and empirical Bayes, and develop a computational framework for the latter. We discuss how empirical Bayes attains consistent variable selection under weaker conditions (sparsity and betamin assumptions) than full Bayes and other standard criteria do, and how it attains faster convergence rates. Our high-dimensional regression examples show that fully Bayesian inference enjoys excellent properties, and that data integration with empirical Bayes can offer moderate yet meaningful improvements in practice.

Paper Structure

This paper contains 32 sections, 16 theorems, 159 equations, 5 figures, 10 tables.

Key Result

Theorem 1

If Assumption A1 holds and $\kappa_b$ implied by $(\bm{\omega},g)$ satisfies A2 and A3, then $\lim_{n \to \infty} E \left[ \pi(\bm{\gamma}^* \mid \bm{y}, \bm{\omega}) \right] =1$.

Figures (5)

  • Figure 1: Data integration with full data (left) and with meta-covariates (right). With full data one observes $q$ datasets with underlying parameters $\bm{\theta}_1,\ldots,\bm{\theta}_q$. The hyper-parameter $\bm{\omega}$ allows sharing information. With meta-covariates one has access to summaries $\bm{z}_1,\ldots,\bm{z}_q$ extracted from unobserved data $\tilde{\bm{y}}_1,\ldots,\tilde{\bm{y}}_q$. These summaries inform the parameter $\bm{\theta}$ governing the dataset of interest $\bm{y}$
  • Figure 2: Multiple Binomial experiments are conducted with $n=30$ (top) and $n=100$ (bottom). The left panels display two prior distributions and the Binomial distribution for success probabilities $\theta=0.1$ and $\theta=0.35$. The right panels display the posterior distribution from experiments with $y=5$ and $y=15$ (top), and $y=17$ and $y=50$ (bottom)
  • Figure 3: Mean squared estimation error (left) and power (right) in simulation study
  • Figure 4: False discovery rate in simulation study. Scenario 1 corresponds to $(\omega_1=2, \omega_2=0)$, Scenario 2 to $(\omega_1=1, \omega_2=0)$ and Scenario 3 to $(\omega_1=0, \omega_2=0)$
  • Figure 5: TGFB data. Left: marginal correlations with the outcome vs. gene being in/out the mouse list Right: posterior inclusion probabilities using empirical Bayes vs. using Beta-Binomial prior

Theorems & Definitions (27)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Corollary 1
  • Proposition 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • ...and 17 more