Table of Contents
Fetching ...

Prot2Text: Multimodal Protein's Function Generation with GNNs and Transformers

Hadi Abdine, Michail Chatzianastasis, Costas Bouyioukos, Michalis Vazirgiannis

TL;DR

A novel approach is proposed, Prot2Text, which predicts a protein's function in a free text style, moving beyond the conventional binary or categorical classifications, by combining Graph Neural Networks and Large Language Models in an encoder-decoder framework.

Abstract

In recent years, significant progress has been made in the field of protein function prediction with the development of various machine-learning approaches. However, most existing methods formulate the task as a multi-classification problem, i.e. assigning predefined labels to proteins. In this work, we propose a novel approach, Prot2Text, which predicts a protein's function in a free text style, moving beyond the conventional binary or categorical classifications. By combining Graph Neural Networks(GNNs) and Large Language Models(LLMs), in an encoder-decoder framework, our model effectively integrates diverse data types including protein sequence, structure, and textual annotation and description. This multimodal approach allows for a holistic representation of proteins' functions, enabling the generation of detailed and accurate functional descriptions. To evaluate our model, we extracted a multimodal protein dataset from SwissProt, and demonstrate empirically the effectiveness of Prot2Text. These results highlight the transformative impact of multimodal models, specifically the fusion of GNNs and LLMs, empowering researchers with powerful tools for more accurate function prediction of existing as well as first-to-see proteins.

Prot2Text: Multimodal Protein's Function Generation with GNNs and Transformers

TL;DR

A novel approach is proposed, Prot2Text, which predicts a protein's function in a free text style, moving beyond the conventional binary or categorical classifications, by combining Graph Neural Networks and Large Language Models in an encoder-decoder framework.

Abstract

In recent years, significant progress has been made in the field of protein function prediction with the development of various machine-learning approaches. However, most existing methods formulate the task as a multi-classification problem, i.e. assigning predefined labels to proteins. In this work, we propose a novel approach, Prot2Text, which predicts a protein's function in a free text style, moving beyond the conventional binary or categorical classifications. By combining Graph Neural Networks(GNNs) and Large Language Models(LLMs), in an encoder-decoder framework, our model effectively integrates diverse data types including protein sequence, structure, and textual annotation and description. This multimodal approach allows for a holistic representation of proteins' functions, enabling the generation of detailed and accurate functional descriptions. To evaluate our model, we extracted a multimodal protein dataset from SwissProt, and demonstrate empirically the effectiveness of Prot2Text. These results highlight the transformative impact of multimodal models, specifically the fusion of GNNs and LLMs, empowering researchers with powerful tools for more accurate function prediction of existing as well as first-to-see proteins.
Paper Structure (28 sections, 4 equations, 5 figures, 2 tables)

This paper contains 28 sections, 4 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Architecture of the proposed Prot2Text framework for predicting protein function descriptions in free text. The model leverages a multimodal approach that integrates protein sequence, structure, and textual annotations. The Encoder-Decoder framework forms the backbone of the model, with the encoder component utilizing a relational graph convolution network (RGCN) to process the protein graphs, and an ESM model to process the protein sequence. A cross-attention mechanism facilitates the exchange of relevant information between the graph-encoded and the sequence-encoded vectors, creating a fused representation synthesizing the structural and textual aspects. The decoder component employs a pre-trained GPT-2 model, to generate detailed and accurate protein descriptions from the fused protein representation. By combining the power of Graph Neural Networks and Large Language Models, Prot2Text enables a holistic representation of protein function, facilitating the generation of comprehensive protein descriptions.
  • Figure 2: The test BLEU score for Prot2Text models as a function of the percentage identity using BLAST hit between the test and the train sets.
  • Figure 3: Ground-truth labels vs text-free Generated functions: A textual comparison of the pre-defined labels and generated text outputs for 3 different proteins from the test set. The used text generation configuration if these examples are the following: length_penalty = 2.0, no_repeat_ngram_size=3 and early_stopping=True.
  • Figure 4: Analyzing Protein Description Lengths: Distribution of Tokens per Sample with Threshold Highlight at 256 tokens (in red).
  • Figure 5: Tracking Prot2Text$_{BASE}$ BLEU Score Progression on Validation Set Across Training Iterations. Higher is better.