Table of Contents
Fetching ...

Gene-Metabolite Association Prediction with Interactive Knowledge Transfer Enhanced Graph for Metabolite Production

Kexuan Xin, Qingyun Wang, Junyu Chen, Pengfei Yu, Huimin Zhao, Heng Ji

TL;DR

An Interactive Knowledge Transfer mechanism based on Metabolism Graphs (IKT4Meta) is introduced to enhance prediction accuracy by integrating cross-metabolism knowledge and using Pretrained Language Models (PLMs) to generate inter-graph links mitigates heterogeneity issues, while intra-graph links are propagated via these anchors.

Abstract

In the rapidly evolving field of metabolic engineering, the quest for efficient and precise gene target identification for metabolite production enhancement presents significant challenges. Traditional approaches, whether knowledge-based or model-based, are notably time-consuming and labor-intensive, due to the vast scale of research literature and the approximation nature of genome-scale metabolic model (GEM) simulations. Therefore, we propose a new task, Gene-Metabolite Association Prediction based on metabolic graphs, to automate the process of candidate gene discovery for a given pair of metabolite and candidate-associated genes, as well as presenting the first benchmark containing 2474 metabolites and 1947 genes of two commonly used microorganisms Saccharomyces cerevisiae (SC) and Issatchenkia orientalis (IO). This task is challenging due to the incompleteness of the metabolic graphs and the heterogeneity among distinct metabolisms. To overcome these limitations, we propose an Interactive Knowledge Transfer mechanism based on Metabolism Graph (IKT4Meta), which improves the association prediction accuracy by integrating the knowledge from different metabolism graphs. First, to build a bridge between two graphs for knowledge transfer, we utilize Pretrained Language Models (PLMs) with external knowledge of genes and metabolites to help generate inter-graph links, significantly alleviating the impact of heterogeneity. Second, we propagate intra-graph links from different metabolic graphs using inter-graph links as anchors. Finally, we conduct the gene-metabolite association prediction based on the enriched metabolism graphs, which integrate the knowledge from multiple microorganisms. Experiments on both types of organisms demonstrate that our proposed methodology outperforms baselines by up to 12.3% across various link prediction frameworks.

Gene-Metabolite Association Prediction with Interactive Knowledge Transfer Enhanced Graph for Metabolite Production

TL;DR

An Interactive Knowledge Transfer mechanism based on Metabolism Graphs (IKT4Meta) is introduced to enhance prediction accuracy by integrating cross-metabolism knowledge and using Pretrained Language Models (PLMs) to generate inter-graph links mitigates heterogeneity issues, while intra-graph links are propagated via these anchors.

Abstract

In the rapidly evolving field of metabolic engineering, the quest for efficient and precise gene target identification for metabolite production enhancement presents significant challenges. Traditional approaches, whether knowledge-based or model-based, are notably time-consuming and labor-intensive, due to the vast scale of research literature and the approximation nature of genome-scale metabolic model (GEM) simulations. Therefore, we propose a new task, Gene-Metabolite Association Prediction based on metabolic graphs, to automate the process of candidate gene discovery for a given pair of metabolite and candidate-associated genes, as well as presenting the first benchmark containing 2474 metabolites and 1947 genes of two commonly used microorganisms Saccharomyces cerevisiae (SC) and Issatchenkia orientalis (IO). This task is challenging due to the incompleteness of the metabolic graphs and the heterogeneity among distinct metabolisms. To overcome these limitations, we propose an Interactive Knowledge Transfer mechanism based on Metabolism Graph (IKT4Meta), which improves the association prediction accuracy by integrating the knowledge from different metabolism graphs. First, to build a bridge between two graphs for knowledge transfer, we utilize Pretrained Language Models (PLMs) with external knowledge of genes and metabolites to help generate inter-graph links, significantly alleviating the impact of heterogeneity. Second, we propagate intra-graph links from different metabolic graphs using inter-graph links as anchors. Finally, we conduct the gene-metabolite association prediction based on the enriched metabolism graphs, which integrate the knowledge from multiple microorganisms. Experiments on both types of organisms demonstrate that our proposed methodology outperforms baselines by up to 12.3% across various link prediction frameworks.

Paper Structure

This paper contains 24 sections, 10 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: An example of the subgraphs from Saccharomyces cerevisiae (SC) and Issatchenkia orientalis (IO). Each metabolic graph contains genes and metabolites, with intra-graph links showing gene-metabolite associations and inter-graph links connecting equivalent genes or metabolites. Predictions for these links are based on existing associations. For example, (cytidine$\rightarrow$G_g514) shows that G_g514 is involved in a reaction with cytidine as a reactant, while (YLR245C$\rightarrow$H$^+$) indicates YLR245C catalyzes a reaction producing H$^+$.
  • Figure 2: Comparison between text and graph structures for gene alignment. The ScholarBERT hong2023diminishing successfully identifies that P53 and RAD9 from different microorganisms are highly similar based on textual information. However, due to the significant differences in structural context between them, the structure encoder kipf2016semi fails to recognize this pair of similar genes.
  • Figure 3: The overview of IKT4Meta framework.
  • Figure 4: Association prediction performance with different value of $\gamma_d$.

Theorems & Definitions (3)

  • Definition 1: Metabolic graph
  • Definition 2: Graph alignment
  • Definition 3: Gene-Metabolite Association Prediction