GraphGDel: Constructing and Learning Graph Representations of Genome-Scale Metabolic Models for Growth-Coupled Gene Deletion Prediction
Ziwei Yang, Takeyuki Tamura
TL;DR
This work tackles growth-coupled gene deletion prediction in genome-scale metabolic models, where traditional approaches lack explicit graph structure. It introduces GraphGDel, which first constructs biologically meaningful graphs from constraint-based models through a four-step pipeline that reduces hub noise and currency metabolites, then learns from both sequence data (SMILES for metabolites and amino acids for genes) and graph topology via four neural modules (Meta-M, Gene-M, Graph-M, Pred-M) trained end-to-end with a composite loss. The approach yields superior predictive performance across three metabolic models compared with strong baselines (DNN and DeepGdel), validated by multiple metrics and ablation studies that demonstrate the value of integrating sequential and graph-based metabolite representations. The framework offers a scalable, data-driven pathway to augment strain-design workflows, with potential extensions to multi-layered graphs incorporating reactions and genes and to more advanced sequence models for richer representations.
Abstract
In genome-scale constraint-based metabolic models, gene deletion strategies are essential for achieving growth-coupled production, where cell growth and target metabolite synthesis occur simultaneously. Despite the inherently networked nature of genome-scale metabolic models, existing computational approaches rely primarily on sequential data and lack graph representations that capture their complex relationships, as both well-defined graph constructions and learning frameworks capable of exploiting them remain largely unexplored. To address this gap, we present a twofold solution. First, we introduce a systematic pipeline for constructing graph representations from constraint-based metabolic models. Second, we develop a deep learning framework that integrates these graph representations with gene and metabolite sequence data to predict growth-coupled gene deletion strategies. Across three metabolic models of varying scale, our approach consistently outperforms established baselines, achieves improvements of 14.04%, 16.26%, and 13.18% in overall accuracy. The source code and example datasets are available at: https://github.com/MetNetComp/GraphGDel.
