Table of Contents
Fetching ...

Sparse species interactions reproduce abundance correlation patterns in microbial communities

José Camacho-Mateu, Aniello Lampo, Matteo Sireci, Miguel Ángel Muñoz, José A. Cuesta

TL;DR

This work presents population models implementing species interactions that enable us to replicate macroecological patterns of species correlations not captured by existing models, and robustly support the importance of species interaction networks.

Abstract

During the last decades macroecology has identified broad-scale patterns of abundances and diversity of microbial communities and put forward some potential explanations for them. However, these advances are not paralleled by a full understanding of the dynamical processes behind them. In particular, abundance fluctuations of different species are found to be correlated, both across time and across communities in metagenomic samples. Reproducing such correlations through appropriate population models remains an open challenge. The present paper tackles this problem and points to sparse species interactions as a necessary mechanism to account for them. Specifically, we discuss several possibilities to include interactions in population models and recognize Lotka-Volterra constants as a successful ansatz. For this, we design a Bayesian inference algorithm to extract sets of interaction constants able to reproduce empirical probability distributions of pairwise correlations for diverse biomes. Importantly, the inferred models still reproduce well-known single-species macroecological patterns concerning abundance fluctuations across both species and communities. Endorsed by the agreement with the empirically observed phenomenology, our analyses provide insights on the properties of the networks of microbial interactions, revealing that sparsity is a crucial feature.

Sparse species interactions reproduce abundance correlation patterns in microbial communities

TL;DR

This work presents population models implementing species interactions that enable us to replicate macroecological patterns of species correlations not captured by existing models, and robustly support the importance of species interaction networks.

Abstract

During the last decades macroecology has identified broad-scale patterns of abundances and diversity of microbial communities and put forward some potential explanations for them. However, these advances are not paralleled by a full understanding of the dynamical processes behind them. In particular, abundance fluctuations of different species are found to be correlated, both across time and across communities in metagenomic samples. Reproducing such correlations through appropriate population models remains an open challenge. The present paper tackles this problem and points to sparse species interactions as a necessary mechanism to account for them. Specifically, we discuss several possibilities to include interactions in population models and recognize Lotka-Volterra constants as a successful ansatz. For this, we design a Bayesian inference algorithm to extract sets of interaction constants able to reproduce empirical probability distributions of pairwise correlations for diverse biomes. Importantly, the inferred models still reproduce well-known single-species macroecological patterns concerning abundance fluctuations across both species and communities. Endorsed by the agreement with the empirically observed phenomenology, our analyses provide insights on the properties of the networks of microbial interactions, revealing that sparsity is a crucial feature.
Paper Structure (1 section, 7 equations, 4 figures)

This paper contains 1 section, 7 equations, 4 figures.

Table of Contents

  1. Introduction

Figures (4)

  • Figure 1: Infographic of the population dynamics and the resulting macroecological patterns. Panel (a) portrays, as an illustrative example, three individual-species (color coded) time courses at equally spaced times (longitudinal data), resulting from the integration of \ref{['eq:mainEquation']}. The abundances at different samples describe the fluctuations around the mean (abundance fluctuation distribution, AFD) are well described by a gamma distribution, as shown in panel (b) (see Figs. S4 and S5 of the SI). For each species, this distribution is characterized by its mean value $\bar{x}_i$ and its variance $\sigma_i^2$. These two magnitudes are linked by Taylor's law $\sigma_i^2\propto\bar{x}^2_i$ (panel (b)). The mean abundances of all species are distributed as a lognormal (mean abundance distribution, MAD) (panel (b)). Further details about Taylor's law and MAD are presented in Figs. S6 and S7 of the SI. Panel (c) illustrates the correlations between abundance fluctuations of pairs of species across samples (a point for each sample/realization). The top-left plot illustrates the case of two uncorrelated species whereas the top-right plot illustrates two positively correlated species. The bottom picture shows the distribution of Pearson's coefficients $\rho_{ij}$ of all pairs of species. Empirically, this distribution is found to generally cover the entire range $-1\le\rho_{ij}\le 1$ and to exhibit a peak at negative values.
  • Figure 2: Distributions of Pearson's abundance correlation coefficients as obtained in the model with (left panel) a few samples of the noise correlation matrix $\mathbf{W}$ (each with a different gray shade) or (right panel) with random samples of the Lotka-Volterra matrix $\mathbf{A}$. The black solid lines portray in each case the empirical distribution as obtained from the Seawater microbiome (species which appear in less than 50% of the communities have been filtered out), while the blue ones represent the distribution of correlations as obtained from the model without interactions. In the left plot, colored circles show the results for a few samples of matrices $\mathbf{W}$ (see 'Material and methods' for details of the sampling procedure); Lotka-Volterra constants are chosen as $a_{ij}=-\delta_{ij} / K_i$, with carrying capacities $K_i$ sampled from a lognormal distribution with mean $0.1$ and standard deviation $0.5$---as for the SLM Grilli2020. The results shown in this figure are typical (see Secs. 7B and C of the SI for a more thorough exploration). In the right plot, colored circles represent correlations resulting from the SLVM with $\mathbf{W}=w\mathbf{I}$ and Lotka-Volterra constants $a_{ij}$ ($i\ne j$) sampled from a Gaussian distribution with zero mean and standard deviation $0.03$. A random selection of $60\%$ of such constants are set to zero (i.e. the connectance of the interaction matrix is $C=0.4$).
  • Figure 3: Grilli's three macroecological laws as a function of the interaction parameters. Specifically, the figure shows the abundance fluctuation distribution (AFD) (panels (a)--(c)), Taylor's law (panels (d)--(f)) and the mean-abundance distribution (MAD, panels (g)--(i)) for different values of the species number $S$ (panels (a), (d), (g)), the connectance $C$ (panels (b), (e), (h)), and standard deviation of the interaction constants $\sigma$ (panels (c), (f), (i)). Results have been averaged over $100$ realizations of the SLVM (\ref{['eq:mainEquation']}) each one with a different random interaction matrix. Results including all realizations are depicted as a cloud of gray points, whereas averages are shown as colored bullets. The AFD obtained for a given realization contains the results for all species, represented in terms of rescaled logarithm abundances ($z=\mathop{\mathrm{Var}}\nolimits(x)^{-1/2}\log(x/\bar{x})$). Solid black lines correspond to gamma distributions. MAD plots (g)--(i) is obtained by properly rescaling the mean abundances, and are fitted by a normalized (zero mean, unit standard deviation) lognormal distribution (black solid line). Similarly, the black straight lines in panels (d)--(f) describe the relation $\mathop{\mathrm{Var}}\nolimits(x_i)\propto\bar{x}_i^2$ in logarithmic scale. Panels: (j), (k), (l) illustrate the limits of the weak-interaction regime across the set of parameters that characterize species interactions. The plots quantify the compliance with (j) a gamma AFD, (k) Taylor's law, and (l) a lognormal MAD, within the region where the system is stable and feasible. Each pixel corresponds to a combination of values of the network connectance $C$ (horizontal axis) and the standard deviation $\sigma$ of the distribution of interactions (vertical axis). The color of the pixel quantifies the distance from the AFD to a gamma distribution (j), the value of the exponent $\gamma$ in the relationship $\mathop{\mathrm{Var}}\nolimits(x_i)\propto\bar{x}_i^{\gamma}$(k), and the distance of the MAD to a lognormal distribution (l), averaged over a sample of $100$ realizations. Gray areas mark the region of the parameter space where the resulting systems are neither stable nor feasible. In these plots $S=50$, $\tau_i=0.1$, $w=0.1$, and the carrying capacities are sampled from a lognormal distribution (mean $0.1$, standard deviation $0.5$).
  • Figure 4: Abundance correlation distributions for real and simulated communities. In (a), different colored bullets correspond to different biomes selected from the EBI metagenomics platform Mitchell2018 (namely Seawater, River, Lake, Glacier, and Sludge communities). Black dashed lines portray the distribution of Pearson's coefficients for the abundance correlation of all pairs of species resulting from the SLM. Gray curves show the same distributions as obtained from the SLVM (c.f. \ref{['eq:mainEquation']}), with the Lotka-Volterra interaction constants inferred using the Bayesian approach described in 'Material and methods'. The inset in the bottom panel of (a) presents the quantile-quantile (QQ) plot, comparing quantiles of the empirical and the synthetic distributions, for the different biomes. The dots sit on the bisector line, indicating the close alignment between the quantiles of both distributions for each single biome. This test is consistent with the results of a Kolmogorov-Smirnov test ($p$-values of Seawater and River: $0.99$; Lake and Glacier: $0.90$; Sludge: $0.81$). The top panel of (b) shows the Euclidean distance between the logarithms of the empirical and the synthetic distributions (for Seawater, using only species appearing in at least 50% of the samples) vs. the iterations of the MCMC. In blue and grey dots, the inset shows the synthetic distributions obtained at the iteration marked by a dashed line of the corresponding color. The empirical distribution is drawn with a solid black line. (c) shows the distribution of absolute values of the interaction constants ($|a_{ij}|$) for a collection of over $200$ matrices generated through the MCMC method (green bullets). This distribution can be fitted by a convex combination of two Gaussian distributions (grey solid line). The black (blue) dashed line fits the broader (narrower) Gaussian. Practically all coefficients in the narrower Gaussian are negligible compared to the broader one. Hence, the presence of a small fraction of large coefficients gives rise to an effective connectance ($C_{\text{eff}}$) in the associated network. The inset shows a histogram of the values of $C_{\text{eff}}$. It peaks around $0.05$, with tails extending approximately in the range $0.01\lesssim C_{\text{eff}}\lesssim 1.4$. This reveals the high sparsity of the interactions.