Table of Contents
Fetching ...

Robust Multi-view Co-expression Network Inference

Teodora Pandeva, Martijs Jonker, Leendert Hamoen, Joris Mooij, Patrick Forré

TL;DR

A robust method for high-dimensional graph inference from multiple independent studies that can identify the co-expression matrix up to a scaling factor among other model parameters and employs an Expectation-Maximization procedure for parameter estimation.

Abstract

Unraveling the co-expression of genes across studies enhances the understanding of cellular processes. Inferring gene co-expression networks from transcriptome data presents many challenges, including spurious gene correlations, sample correlations, and batch effects. To address these complexities, we introduce a robust method for high-dimensional graph inference from multiple independent studies. We base our approach on the premise that each dataset is essentially a noisy linear mixture of gene loadings that follow a multivariate $t$-distribution with a sparse precision matrix, which is shared across studies. This allows us to show that we can identify the co-expression matrix up to a scaling factor among other model parameters. Our method employs an Expectation-Maximization procedure for parameter estimation. Empirical evaluation on synthetic and gene expression data demonstrates our method's improved ability to learn the underlying graph structure compared to baseline methods.

Robust Multi-view Co-expression Network Inference

TL;DR

A robust method for high-dimensional graph inference from multiple independent studies that can identify the co-expression matrix up to a scaling factor among other model parameters and employs an Expectation-Maximization procedure for parameter estimation.

Abstract

Unraveling the co-expression of genes across studies enhances the understanding of cellular processes. Inferring gene co-expression networks from transcriptome data presents many challenges, including spurious gene correlations, sample correlations, and batch effects. To address these complexities, we introduce a robust method for high-dimensional graph inference from multiple independent studies. We base our approach on the premise that each dataset is essentially a noisy linear mixture of gene loadings that follow a multivariate -distribution with a sparse precision matrix, which is shared across studies. This allows us to show that we can identify the co-expression matrix up to a scaling factor among other model parameters. Our method employs an Expectation-Maximization procedure for parameter estimation. Empirical evaluation on synthetic and gene expression data demonstrates our method's improved ability to learn the underlying graph structure compared to baseline methods.
Paper Structure (31 sections, 8 theorems, 37 equations, 4 figures)

This paper contains 31 sections, 8 theorems, 37 equations, 4 figures.

Key Result

Proposition 2.1

Let ${\mathbf{X}}_1,\ldots, {\mathbf{X}}_D$ with ${\mathbf{X}}_d\in \mathbb{R}^{p\times n_d}$ be random matrices with the following two representations: where for $d=1,\ldots, D$, both representations $A_d^{(1)}\in\mathbb{R}^{k_d^{(1)}\times n_d},B_d^{(1)}\in\mathbb{R}^{ (n_d-k_d^{(1)})\times n_d},{\mathbf{S}}_d^{(1)}\in \mathbb{R}^{p\times k_d^{(1)}},{\mathbf{Z}}_d^{(1)}\in \mathbb{R}^{p\times (

Figures (4)

  • Figure 1: Two variations of the gene regulation of genes $X, Y, Z$ (A) colored in purple and their corresponding co-expression network illustrated in (B) in yellow. In (A) (left), genes $X,Y,Z$ are regulated by a common latent factor, such as another gene. The example in (A) (right) shows that gene $X$ regulates both $Y$ and $Z$. In addition, a bi-directional dashed line indicates potential confounding between genes $Y$ and $Z$.
  • Figure 2: ROC curves summarizing the benchmark experiment on the data generated as described in \ref{['subsec:sim']} with a total of $100$ sample loadings. Each curve represents the average result of $100$ experiments. The number of signal loadings $k$ varies in each experiment, as indicated in the subplot titles. The results show that MVTLASSO outperforms TLASSO and GLASSO.
  • Figure 3: We set both $k=50$ and $r=50$ and varied $D=2, 5, 10$. For each case, we ran 100 synthetic experiments while varying the sparsity parameter. The averaged ROC curves show that increasing the number of views ($D$) improves performance.
  • Figure 4: True positive vs. possibly false positive edges obtained via stability selection for various penalty parameters. The results demonstrate that MVTLASSO consistently infers more true positive edges across all settings.

Theorems & Definitions (14)

  • Definition 2.1
  • Proposition 2.1
  • Lemma B.1
  • proof
  • Theorem B.1
  • proof
  • Theorem B.2
  • proof
  • Corollary B.1
  • Theorem C.1: arellano1995some
  • ...and 4 more