Table of Contents
Fetching ...

Bayesian covariance regression for differential network analysis of zero-inflated microbiome data

Zichun Xu, Jing Ma

Abstract

Microbial interaction networks can rewire in response to host and environmental factors, yet most existing methods for network estimation treat the covariance structure as static across samples. We propose TRECOR, a Bayesian covariance regression framework for inferring covariate-dependent microbial covariation networks from zero-inflated compositional count data. The method models microbiome counts through a latent multivariate normal distribution defined on the internal nodes of a phylogenetic tree, where both the mean and covariance of the latent variables depend on covariates. The covariance is decomposed into a sparse baseline component, representing a stable microbial covariation network, and a low-rank covariate-dependent perturbation that captures network rewiring. By exploiting the binomial factorization of the multinomial distribution under the logistic-tree-normal representation, the model achieves full conjugacy and posterior inference proceeds via an efficient Gibbs sampler. In simulations, TRECOR substantially outperforms covariance regression applied to transformed counts, demonstrating the importance of explicitly modeling the compositional sampling layer. Applied to gut microbiome data from 531 individuals across three countries, we find that age has the largest effect on microbial covariation, which is a pattern not revealed by mean-based analysis alone. The age-associated differential network is enriched for Enterobacteriaceae and related families, consistent with known developmental shifts in the gut microbiota, while country-associated differential networks implicate diet-related taxa.

Bayesian covariance regression for differential network analysis of zero-inflated microbiome data

Abstract

Microbial interaction networks can rewire in response to host and environmental factors, yet most existing methods for network estimation treat the covariance structure as static across samples. We propose TRECOR, a Bayesian covariance regression framework for inferring covariate-dependent microbial covariation networks from zero-inflated compositional count data. The method models microbiome counts through a latent multivariate normal distribution defined on the internal nodes of a phylogenetic tree, where both the mean and covariance of the latent variables depend on covariates. The covariance is decomposed into a sparse baseline component, representing a stable microbial covariation network, and a low-rank covariate-dependent perturbation that captures network rewiring. By exploiting the binomial factorization of the multinomial distribution under the logistic-tree-normal representation, the model achieves full conjugacy and posterior inference proceeds via an efficient Gibbs sampler. In simulations, TRECOR substantially outperforms covariance regression applied to transformed counts, demonstrating the importance of explicitly modeling the compositional sampling layer. Applied to gut microbiome data from 531 individuals across three countries, we find that age has the largest effect on microbial covariation, which is a pattern not revealed by mean-based analysis alone. The age-associated differential network is enriched for Enterobacteriaceae and related families, consistent with known developmental shifts in the gut microbiota, while country-associated differential networks implicate diet-related taxa.

Paper Structure

This paper contains 33 sections, 1 theorem, 29 equations, 13 figures, 3 tables.

Key Result

Lemma 2.1

Define: where $\boldsymbol{b}_{jr} \in {\mathbb R}^{q}$ is the $j$-th column of $\mathbf{B}_r$. The set of parameters $\left\{ {\mathbf{B}_r} \right\}_{r=1}^R$ can thus be equivalently represented as $\left\{ {\mathbf{B}^{(j)}} \right\}_{j=1}^d$. Suppose $\left\{ {\mathbf{A}_r} \right\}_{r=1}^R$ is another

Figures (13)

  • Figure 1: Illustration of covariate-dependent covariation in the yatsunenko2012human study. (A) Feature variances; (B) Feature correlations.
  • Figure 2: Average ROC curves for recovering nonzero off-diagonal entries of the population covariance matrix $\mathbf{\Sigma}$ across 100 replications. Panels correspond to $n\in\{150,250\}$ and three sparse structures for $\mathbf{\Sigma}$ (tridiagonal, scale-free, and tree-based). Curves compare gLASSO (coral dashed), TRECOR (green solid), TRECOR-oracle (light blue longdash) and CovReg (violet dotdash).
  • Figure 3: Left panel: posterior mean effect size distribution for each covariate (excluding intercept), represented by the posterior distribution of $\left\lVert \boldsymbol{b}_{j0} \right\rVert_2^2$ where $\boldsymbol{b}_{j0} \in {\mathbb R}^{q}$ is the $j$th column of $\mathbf{B}_0$. Right panel: posterior covariance effect size distribution for each covariate (excluding intercept), represented by the posterior distribution of $\left\lVert \mathbf{B}^{(j)} \right\rVert_F$.
  • Figure 4: (A) Posterior mean of the population correlation matrix $\mathbf{\Sigma}$, after filtering out statistically insignificant off-diagonal entries. Rows and columns are ordered according to a depth-first traversal of the phylogenetic tree. (B) The 10 largest correlations in absolute value from the population correlation matrix, overlaid on the phylogenetic tree. Red and blue edges indicate positive and negative network links, respectively, with line width proportional to the absolute posterior mean correlation. Node color indicates the degree of each node in the correlation network.
  • Figure 5: Average precision-recall curves for recovering nonzero off-diagonal entries of the population covariance matrix $\mathbf{\Sigma}$ across 100 replications. Panels correspond to $n\in\{150,250\}$ and three sparse structures for $\mathbf{\Sigma}$ (tridiagonal, scale-free, and tree-based). gLASSO (coral dashed), TRECOR (green solid), TRECOR-oracle (light blue longdash) and CovReg (violet dotdash).
  • ...and 8 more figures

Theorems & Definitions (1)

  • Lemma 2.1