Table of Contents
Fetching ...

GraphGDel: Constructing and Learning Graph Representations of Genome-Scale Metabolic Models for Growth-Coupled Gene Deletion Prediction

Ziwei Yang, Takeyuki Tamura

TL;DR

This work tackles growth-coupled gene deletion prediction in genome-scale metabolic models, where traditional approaches lack explicit graph structure. It introduces GraphGDel, which first constructs biologically meaningful graphs from constraint-based models through a four-step pipeline that reduces hub noise and currency metabolites, then learns from both sequence data (SMILES for metabolites and amino acids for genes) and graph topology via four neural modules (Meta-M, Gene-M, Graph-M, Pred-M) trained end-to-end with a composite loss. The approach yields superior predictive performance across three metabolic models compared with strong baselines (DNN and DeepGdel), validated by multiple metrics and ablation studies that demonstrate the value of integrating sequential and graph-based metabolite representations. The framework offers a scalable, data-driven pathway to augment strain-design workflows, with potential extensions to multi-layered graphs incorporating reactions and genes and to more advanced sequence models for richer representations.

Abstract

In genome-scale constraint-based metabolic models, gene deletion strategies are essential for achieving growth-coupled production, where cell growth and target metabolite synthesis occur simultaneously. Despite the inherently networked nature of genome-scale metabolic models, existing computational approaches rely primarily on sequential data and lack graph representations that capture their complex relationships, as both well-defined graph constructions and learning frameworks capable of exploiting them remain largely unexplored. To address this gap, we present a twofold solution. First, we introduce a systematic pipeline for constructing graph representations from constraint-based metabolic models. Second, we develop a deep learning framework that integrates these graph representations with gene and metabolite sequence data to predict growth-coupled gene deletion strategies. Across three metabolic models of varying scale, our approach consistently outperforms established baselines, achieves improvements of 14.04%, 16.26%, and 13.18% in overall accuracy. The source code and example datasets are available at: https://github.com/MetNetComp/GraphGDel.

GraphGDel: Constructing and Learning Graph Representations of Genome-Scale Metabolic Models for Growth-Coupled Gene Deletion Prediction

TL;DR

This work tackles growth-coupled gene deletion prediction in genome-scale metabolic models, where traditional approaches lack explicit graph structure. It introduces GraphGDel, which first constructs biologically meaningful graphs from constraint-based models through a four-step pipeline that reduces hub noise and currency metabolites, then learns from both sequence data (SMILES for metabolites and amino acids for genes) and graph topology via four neural modules (Meta-M, Gene-M, Graph-M, Pred-M) trained end-to-end with a composite loss. The approach yields superior predictive performance across three metabolic models compared with strong baselines (DNN and DeepGdel), validated by multiple metrics and ablation studies that demonstrate the value of integrating sequential and graph-based metabolite representations. The framework offers a scalable, data-driven pathway to augment strain-design workflows, with potential extensions to multi-layered graphs incorporating reactions and genes and to more advanced sequence models for richer representations.

Abstract

In genome-scale constraint-based metabolic models, gene deletion strategies are essential for achieving growth-coupled production, where cell growth and target metabolite synthesis occur simultaneously. Despite the inherently networked nature of genome-scale metabolic models, existing computational approaches rely primarily on sequential data and lack graph representations that capture their complex relationships, as both well-defined graph constructions and learning frameworks capable of exploiting them remain largely unexplored. To address this gap, we present a twofold solution. First, we introduce a systematic pipeline for constructing graph representations from constraint-based metabolic models. Second, we develop a deep learning framework that integrates these graph representations with gene and metabolite sequence data to predict growth-coupled gene deletion strategies. Across three metabolic models of varying scale, our approach consistently outperforms established baselines, achieves improvements of 14.04%, 16.26%, and 13.18% in overall accuracy. The source code and example datasets are available at: https://github.com/MetNetComp/GraphGDel.

Paper Structure

This paper contains 36 sections, 50 equations, 2 figures, 9 tables, 4 algorithms.

Figures (2)

  • Figure 1: A toy example of the constraint-based model where circles and rectangles represent metabolites and reactions, respectively. Black and white rectangles denote external and internal reactions, respectively. $r_1$, $r_2$ correspond to two substrate uptake reactions. $r_7$, $r_8$ correspond to cell growth, and target metabolite production reactions, respectively. The reaction rates are constrained by the range $[l_i, u_i]$. In this example, all stoichiometric ratios are 1 or -1.
  • Figure 2: A system overview of the proposed gene deletion strategy prediction framework. The framework comprises four neural network-based modules: (1) Meta-M, which learns the metabolite latent representation $Z_{meta}$, (2) Gene-M, which learns the gene latent representation $Z_{gene}$, (3) Graph-M, which learns the refined metabolite latent representation $Z_{metaG}$ in a specific metabolic graph, and (4) Pred-M, which integrates the three upstream latents into a new latent representation, $Z$, for final gene deletion prediction.