Table of Contents
Fetching ...

A generalizable framework for unlocking missing reactions in genome-scale metabolic networks using deep learning

Xiaoyi Liu, Hongpeng Yang, Chengwei Ai, Ruihan Dong, Yijie Ding, Qianqian Yuan, Jijun Tang, Fei Guo

TL;DR

CLOSEgaps addresses the challenge of incomplete genome-scale metabolic models by reframing gap-filling as hyperedge prediction on a hypergraph representation of GEMs. It uses a hypergraph convolutional network with attention to rank and predict missing reactions from a pool of hypothetical BiGG reactions, enabling automatic gap-filling without experimental data. The approach yields high accuracy in recovering artificial gaps, improves phenotypic predictions across 24 draft GEMs, and enhances fermentation-relevant metabolite production in select organisms, outperforming multiple topology-based baselines. By integrating MILP-guided validation and using negative sampling, CLOSEgaps offers a scalable, generalizable framework for rapid GEM curation and metabolic design.

Abstract

Incomplete knowledge of metabolic processes hinders the accuracy of GEnome-scale Metabolic models (GEMs), which in turn impedes advancements in systems biology and metabolic engineering. Existing gap-filling methods typically rely on phenotypic data to minimize the disparity between computational predictions and experimental results. However, there is still a lack of an automatic and precise gap-filling method for initial state GEMs before experimental data and annotated genomes become available. In this study, we introduce CLOSEgaps, a deep learning-driven tool that addresses the gap-filling issue by modeling it as a hyperedge prediction problem within GEMs. Specifically, CLOSEgaps maps metabolic networks as hypergraphs and learns their hyper-topology features to identify missing reactions and gaps by leveraging hypothetical reactions. This innovative approach allows for the characterization and curation of both known and hypothetical reactions within metabolic networks. Extensive results demonstrate that CLOSEgaps accurately gap-filling over 96% of artificially introduced gaps for various GEMs. Furthermore, CLOSEgaps enhances phenotypic predictions for 24 GEMs and also finds a notable improvement in producing four crucial metabolites (Lactate, Ethanol, Propionate, and Succinate) in two organisms. As a broadly applicable solution for any GEM, CLOSEgaps represents a promising model to automate the gap-filling process and uncover missing connections between reactions and observed metabolic phenotypes.

A generalizable framework for unlocking missing reactions in genome-scale metabolic networks using deep learning

TL;DR

CLOSEgaps addresses the challenge of incomplete genome-scale metabolic models by reframing gap-filling as hyperedge prediction on a hypergraph representation of GEMs. It uses a hypergraph convolutional network with attention to rank and predict missing reactions from a pool of hypothetical BiGG reactions, enabling automatic gap-filling without experimental data. The approach yields high accuracy in recovering artificial gaps, improves phenotypic predictions across 24 draft GEMs, and enhances fermentation-relevant metabolite production in select organisms, outperforming multiple topology-based baselines. By integrating MILP-guided validation and using negative sampling, CLOSEgaps offers a scalable, generalizable framework for rapid GEM curation and metabolic design.

Abstract

Incomplete knowledge of metabolic processes hinders the accuracy of GEnome-scale Metabolic models (GEMs), which in turn impedes advancements in systems biology and metabolic engineering. Existing gap-filling methods typically rely on phenotypic data to minimize the disparity between computational predictions and experimental results. However, there is still a lack of an automatic and precise gap-filling method for initial state GEMs before experimental data and annotated genomes become available. In this study, we introduce CLOSEgaps, a deep learning-driven tool that addresses the gap-filling issue by modeling it as a hyperedge prediction problem within GEMs. Specifically, CLOSEgaps maps metabolic networks as hypergraphs and learns their hyper-topology features to identify missing reactions and gaps by leveraging hypothetical reactions. This innovative approach allows for the characterization and curation of both known and hypothetical reactions within metabolic networks. Extensive results demonstrate that CLOSEgaps accurately gap-filling over 96% of artificially introduced gaps for various GEMs. Furthermore, CLOSEgaps enhances phenotypic predictions for 24 GEMs and also finds a notable improvement in producing four crucial metabolites (Lactate, Ethanol, Propionate, and Succinate) in two organisms. As a broadly applicable solution for any GEM, CLOSEgaps represents a promising model to automate the gap-filling process and uncover missing connections between reactions and observed metabolic phenotypes.
Paper Structure (4 sections, 5 equations, 9 figures, 1 table)

This paper contains 4 sections, 5 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: a The pipeline of CLOSEgaps. We formulated the process as five distinct phases: (1) mapping GEM to hypergraph, (2) negative sampling, (3) feature initialization, (4) feature refinement, and (5) prediction or gap-filling. b Mapping GEM to hypergraph with BiGG reactions and ChEBI metabolites database, metabolites are represented by SMILES. The processed ChEBI database is used for negative sampling. c Processing and predicting. c.1 The incidence matrix of the hypergraph (incidence matrix of GEM and incidence matrix with negative samples), and the similarity of the metabolites matrix are used to initialize features through a fully connected layer. c.2 The hypergraph convolution and hypergraph attention networks are used to refine hypernode and hyperedge features. c.3 The ranking module predicts missing reactions. d The gap-filling inference workflow: (1) The draft GEM and hypothetical database as input, GEM are fully used as the training set and ranking each reaction in hypothetical reaction pool, (2) FBA is utilized to predict fermentation phenotypes for the gap-filled GEMs and the wild-type GEMs, and (3) MILP causally suggests the missing reactions for the production of phenotypes.
  • Figure 2: Performance validation using artificially introduced gaps.(A, B, C) Boxplots of the performance metrics (AUC, AUPR, Accuracy) calculated on 7 datasets (each dot represents a dataset) for CLOSEgaps vs.CHESHIRE, GraphSAGE, HGNN, RGNN, NHP, GCN, and Node2Vector.
  • Figure 3: Performance validation using artificially introduced gaps.(A, B, C) Boxplots of the performance metrics (F1 score, Precision, and Recall) calculated on 7 datasets (each dot represents a dataset) for CLOSEgaps vs.CHESHIRE, GraphSAGE, HGNN, RGNN, NHP, GCN, and Node2Vector.
  • Figure 4: Comparison of CLOSEgaps with other methods (CHESHIRE, GraphSAGE, HGNN, RGNN, NHP, GCN, and Node2Vector) in the recovery of reactions from $4$ GEMs. Reactions were removed randomly from the GEMs and treated as unobserved in the testing set.
  • Figure 5: Performance Comparison of Fermentation Product Predictions in Gap-Filled Metabolic Networks. Boxplots of the performance metrics (F1 score, AUC, AUPR, Precision, Recall, and Accuracy) calculated on $24$ BiGG GEMs (each dot represents a GEM) for Node2Vector, GCN, RGCN, HGNN, GraphSAGE, CHESHIRE, and CLOSEgaps. “CarveMe” represents the draft models reconstructed from CarveMe. Each GEM was subsequently gap-filled with $200$ additional reactions predicted by each respective model. The median value for each metric is indicated by the central line in the boxplots. Statistical significance was assessed using a two-sided paired-sample t-test, with exact p-values reported.
  • ...and 4 more figures