Table of Contents
Fetching ...

Probabilistic Joint and Individual Variation Explained (ProJIVE) for Data Integration

Raphiel J. Murden, Ganzhong Tian, Deqiang Qiu, Benajmin B. Risk

Abstract

Collecting multiple types of data on the same set of subjects is common in modern scientific applications including, genomics, metabolomics, and neuroimaging. Joint and Individual Variance Explained (JIVE) seeks a low-rank approximation of the joint variation between two or more sets of features captured on common subjects and isolates this variation from that unique to eachset of features. We develop an expectation-maximization (EM) algorithm to estimate a probabilistic model for the JIVE framework. The model extends probabilistic principal components analysis to multiple data sets. Our maximum likelihood approach simultaneously estimates joint and individual components, which can lead to greater accuracy compared to other methods. We apply ProJIVE to measures of brain morphometry and cognition in Alzheimer's disease. ProJIVE learns biologically meaningful courses of variation, and the joint morphometry and cognition subject scores are strongly related to more expensive existing biomarkers. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Code to reproduce the analysis is available on our GitHub page.

Probabilistic Joint and Individual Variation Explained (ProJIVE) for Data Integration

Abstract

Collecting multiple types of data on the same set of subjects is common in modern scientific applications including, genomics, metabolomics, and neuroimaging. Joint and Individual Variance Explained (JIVE) seeks a low-rank approximation of the joint variation between two or more sets of features captured on common subjects and isolates this variation from that unique to eachset of features. We develop an expectation-maximization (EM) algorithm to estimate a probabilistic model for the JIVE framework. The model extends probabilistic principal components analysis to multiple data sets. Our maximum likelihood approach simultaneously estimates joint and individual components, which can lead to greater accuracy compared to other methods. We apply ProJIVE to measures of brain morphometry and cognition in Alzheimer's disease. ProJIVE learns biologically meaningful courses of variation, and the joint morphometry and cognition subject scores are strongly related to more expensive existing biomarkers. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Code to reproduce the analysis is available on our GitHub page.
Paper Structure (23 sections, 1 theorem, 10 equations, 8 figures)

This paper contains 23 sections, 1 theorem, 10 equations, 8 figures.

Key Result

Theorem 1

Suppose $K=2$ and let $f_\Phi$ define the multivariate normal density according to the parameters $\{ \mathbf{W}_{J1}, \mathbf{W}_{J2}, \mathbf{W}_{I1}, \mathbf{W}_{I2}, \; \sigma_1^2, \; \sigma_2^2 \}$ and the assumptions in eqn:jive.ml.mod. Let $f_{\Phi^*}$ denote the multivariate normal density w

Figures (8)

  • Figure 1:
  • Figure 2:
  • Figure 3:
  • Figure 4:
  • Figure 5:
  • ...and 3 more figures

Theorems & Definitions (1)

  • Theorem 1