Fused Lasso Improves Accuracy of Co-occurrence Network Inference in Grouped Samples
Daniel Agyapong, Briana H. Beatty, Peter G. Kennedy, Jane C. Marks, Toby D. Hocking
TL;DR
This work tackles the challenge of inferring microbiome co-occurrence networks across heterogeneous environments by introducing the Same-All Cross-validation (SAC) framework, which separately evaluates within-habitat and cross-habitat generalization. It adapts the fused-lasso based fuser algorithm to microbiome data, decomposing edge weights into a global component and habitat-specific deviations, and optimizes a joint objective with sparsity and fusion penalties across habitats. Across six public grouped-sample datasets, fuser achieves comparable performance to glmnet in homogeneous settings and substantially improves predictive accuracy in cross-environment scenarios, with taxon-wise analyses revealing complementary strengths across methods. The study provides a principled toolbox for cross-environment network inference, offering improved robustness to environmental heterogeneity and more nuanced ecological insights into microbial interactions across space and time.
Abstract
Co-occurrence network inference algorithms have significantly advanced our understanding of microbiome communities. However, these algorithms typically analyze microbial associations within samples collected from a single environmental niche, often capturing only static snapshots rather than dynamic microbial processes. Previous studies have commonly grouped samples from different environmental niches together without fully considering how microbial communities adapt their associations when faced with varying ecological conditions. Our study addresses this limitation by explicitly investigating both spatial and temporal dynamics of microbial communities. We analyzed publicly available microbiome abundance data across multiple locations and time points, to evaluate algorithm performance in predicting microbial associations using our proposed Same-All Cross-validation (SAC) framework. SAC evaluates algorithms in two distinct scenarios: training and testing within the same environmental niche (Same), and training and testing on combined data from multiple environmental niches (All). To overcome the limitations of conventional algorithms, we propose fuser, an algorithm that, while not entirely new in machine learning, is novel for microbiome community network inference. It retains subsample-specific signals while simultaneously sharing relevant information across environments during training. Unlike standard approaches that infer a single generalized network from combined data, fuser generates distinct, environment-specific predictive networks. Our results demonstrate that fuser achieves comparable predictive performance to existing algorithms such as glmnet when evaluated within homogeneous environments (Same), and notably reduces test error compared to baseline algorithms in cross-environment (All) scenarios.
