Robust Multi-view Co-expression Network Inference

Teodora Pandeva; Martijs Jonker; Leendert Hamoen; Joris Mooij; Patrick Forré

Robust Multi-view Co-expression Network Inference

Teodora Pandeva, Martijs Jonker, Leendert Hamoen, Joris Mooij, Patrick Forré

TL;DR

A robust method for high-dimensional graph inference from multiple independent studies that can identify the co-expression matrix up to a scaling factor among other model parameters and employs an Expectation-Maximization procedure for parameter estimation.

Abstract

Unraveling the co-expression of genes across studies enhances the understanding of cellular processes. Inferring gene co-expression networks from transcriptome data presents many challenges, including spurious gene correlations, sample correlations, and batch effects. To address these complexities, we introduce a robust method for high-dimensional graph inference from multiple independent studies. We base our approach on the premise that each dataset is essentially a noisy linear mixture of gene loadings that follow a multivariate $t$-distribution with a sparse precision matrix, which is shared across studies. This allows us to show that we can identify the co-expression matrix up to a scaling factor among other model parameters. Our method employs an Expectation-Maximization procedure for parameter estimation. Empirical evaluation on synthetic and gene expression data demonstrates our method's improved ability to learn the underlying graph structure compared to baseline methods.

Robust Multi-view Co-expression Network Inference

TL;DR

Abstract

-distribution with a sparse precision matrix, which is shared across studies. This allows us to show that we can identify the co-expression matrix up to a scaling factor among other model parameters. Our method employs an Expectation-Maximization procedure for parameter estimation. Empirical evaluation on synthetic and gene expression data demonstrates our method's improved ability to learn the underlying graph structure compared to baseline methods.

Paper Structure (31 sections, 8 theorems, 37 equations, 4 figures)

This paper contains 31 sections, 8 theorems, 37 equations, 4 figures.

Introduction
Robust Co-Expression Inference from non-i.i.d Samples
Identifiability Guarantees
Parameter Estimation
The Expectation-Maximization Procedure
E-step:
M-step:
Results
Simulated Data
Gene Co-Expression Inference for Bacillus Subtilis
Discussion
Related Work
Identifiability
Proof of \ref{['lemma:gaussian']}
Proof of \ref{['thm:single-view']}
...and 16 more sections

Key Result

Proposition 2.1

Let ${\mathbf{X}}_1,\ldots, {\mathbf{X}}_D$ with ${\mathbf{X}}_d\in \mathbb{R}^{p\times n_d}$ be random matrices with the following two representations: where for $d=1,\ldots, D$, both representations $A_d^{(1)}\in\mathbb{R}^{k_d^{(1)}\times n_d},B_d^{(1)}\in\mathbb{R}^{ (n_d-k_d^{(1)})\times n_d},{\mathbf{S}}_d^{(1)}\in \mathbb{R}^{p\times k_d^{(1)}},{\mathbf{Z}}_d^{(1)}\in \mathbb{R}^{p\times (

Figures (4)

Figure 1: Two variations of the gene regulation of genes $X, Y, Z$ (A) colored in purple and their corresponding co-expression network illustrated in (B) in yellow. In (A) (left), genes $X,Y,Z$ are regulated by a common latent factor, such as another gene. The example in (A) (right) shows that gene $X$ regulates both $Y$ and $Z$. In addition, a bi-directional dashed line indicates potential confounding between genes $Y$ and $Z$.
Figure 2: ROC curves summarizing the benchmark experiment on the data generated as described in \ref{['subsec:sim']} with a total of $100$ sample loadings. Each curve represents the average result of $100$ experiments. The number of signal loadings $k$ varies in each experiment, as indicated in the subplot titles. The results show that MVTLASSO outperforms TLASSO and GLASSO.
Figure 3: We set both $k=50$ and $r=50$ and varied $D=2, 5, 10$. For each case, we ran 100 synthetic experiments while varying the sparsity parameter. The averaged ROC curves show that increasing the number of views ($D$) improves performance.
Figure 4: True positive vs. possibly false positive edges obtained via stability selection for various penalty parameters. The results demonstrate that MVTLASSO consistently infers more true positive edges across all settings.

Theorems & Definitions (14)

Definition 2.1
Proposition 2.1
Lemma B.1
proof
Theorem B.1
proof
Theorem B.2
proof
Corollary B.1
Theorem C.1: arellano1995some
...and 4 more

Robust Multi-view Co-expression Network Inference

TL;DR

Abstract

Robust Multi-view Co-expression Network Inference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (14)