Table of Contents
Fetching ...

gFlora: a topology-aware method to discover functional co-response groups in soil microbial communities

Nan Chen, Merlijn Schram, Doina Bucur

TL;DR

gFlora addresses the challenge of identifying functional co-response groups in soil microbiomes by integrating taxa abundances, a functional readout, and a topology-aware co-occurrence network via graph convolution. It optimizes a binary group indicator to maximize the correlation between the group’s topologically-informed abundance and the functional variable, using a genetic algorithm with regularization to prevent overfitting. Across two real datasets (bacteria and nematodes) and the PMN functional measure, gFlora outperforms the state-of-the-art EQO, discovers under-studied taxa, and reveals that important taxa span multiple network clusters yet form interconnected functional units. The method provides a practical, topology-aware tool for ecological hypothesis generation and soil health assessment, with code available on GitHub.

Abstract

We aim to learn the functional co-response group: a group of taxa whose co-response effect (the representative characteristic of the group showing the total topological abundance of taxa) co-responds (associates well statistically) to a functional variable. Different from the state-of-the-art method, we model the soil microbial community as an ecological co-occurrence network with the taxa as nodes (weighted by their abundance) and their relationships (a combination from both spatial and functional ecological aspects) as edges (weighted by the strength of the relationships). Then, we design a method called gFlora which notably uses graph convolution over this co-occurrence network to get the co-response effect of the group, such that the network topology is also considered in the discovery process. We evaluate gFlora on two real-world soil microbiome datasets (bacteria and nematodes) and compare it with the state-of-the-art method. gFlora outperforms this on all evaluation metrics, and discovers new functional evidence for taxa which were so far under-studied. We show that the graph convolution step is crucial to taxa with relatively low abundance (thus removing the bias towards taxa with higher abundance), and the discovered bacteria of different genera are distributed in the co-occurrence network but still tightly connected among themselves, demonstrating that topologically they fill different but collaborative functional roles in the ecological community.

gFlora: a topology-aware method to discover functional co-response groups in soil microbial communities

TL;DR

gFlora addresses the challenge of identifying functional co-response groups in soil microbiomes by integrating taxa abundances, a functional readout, and a topology-aware co-occurrence network via graph convolution. It optimizes a binary group indicator to maximize the correlation between the group’s topologically-informed abundance and the functional variable, using a genetic algorithm with regularization to prevent overfitting. Across two real datasets (bacteria and nematodes) and the PMN functional measure, gFlora outperforms the state-of-the-art EQO, discovers under-studied taxa, and reveals that important taxa span multiple network clusters yet form interconnected functional units. The method provides a practical, topology-aware tool for ecological hypothesis generation and soil health assessment, with code available on GitHub.

Abstract

We aim to learn the functional co-response group: a group of taxa whose co-response effect (the representative characteristic of the group showing the total topological abundance of taxa) co-responds (associates well statistically) to a functional variable. Different from the state-of-the-art method, we model the soil microbial community as an ecological co-occurrence network with the taxa as nodes (weighted by their abundance) and their relationships (a combination from both spatial and functional ecological aspects) as edges (weighted by the strength of the relationships). Then, we design a method called gFlora which notably uses graph convolution over this co-occurrence network to get the co-response effect of the group, such that the network topology is also considered in the discovery process. We evaluate gFlora on two real-world soil microbiome datasets (bacteria and nematodes) and compare it with the state-of-the-art method. gFlora outperforms this on all evaluation metrics, and discovers new functional evidence for taxa which were so far under-studied. We show that the graph convolution step is crucial to taxa with relatively low abundance (thus removing the bias towards taxa with higher abundance), and the discovered bacteria of different genera are distributed in the co-occurrence network but still tightly connected among themselves, demonstrating that topologically they fill different but collaborative functional roles in the ecological community.
Paper Structure (11 sections, 8 equations, 10 figures)

This paper contains 11 sections, 8 equations, 10 figures.

Figures (10)

  • Figure 1: Overview of the gFlora method: from observational data (left) to optimizing the functional group $\mathbf{x}$ (right).
  • Figure 2: The gFlora output: a functional co-response group of taxa as an undirected network
  • Figure 3: Comparison of $\mathbf{M}$ between EQO (the original relative abundance) and gFlora (the updated topological relative abundance).
  • Figure 4: Results on bacteria. Group size selection with AIC (top, dots are AICs got in repeated experiments, line plots represent the change of mean values) and comparison of performance (bottom, bars are mean values and errors are standard deviations). *: significant; NS: not significant.
  • Figure 5: Results on nematodes. Group size selection with AIC (top, dots are AICs got in repeated experiments, line plots represent the change of mean values) and comparison of performance (bottom, bars are mean values and errors are standard deviations). *: significant.
  • ...and 5 more figures