Protein Secondary Structure Prediction Using 3D Graphs and Relation-Aware Message Passing Transformers

Disha Varshney; Samarth Garg; Sarthak Tyagi; Deeksha Varshney; Nayan Deep; Asif Ekbal

Protein Secondary Structure Prediction Using 3D Graphs and Relation-Aware Message Passing Transformers

Disha Varshney, Samarth Garg, Sarthak Tyagi, Deeksha Varshney, Nayan Deep, Asif Ekbal

TL;DR

This work tackles protein secondary structure prediction from sequence by introducing SSRGNet, a framework that fuses DistilProtBert sequence embeddings with a Relational Graph Convolutional Network operating on multi-relational protein graphs. The method explicitly encodes 3D structural information through three edge types and uses a parallel fusion strategy to combine sequence and structure cues, achieving improved F1 scores on the NetSurfP-2.0 benchmarks across CB513, TS115, and CASP12. Key contributions include the design of a residue-graph representation, a two-layer R-GCN for relational message passing, and an evaluation showing structure-aware encoding enhances PSSP performance beyond sequence-only baselines. The approach has implications for more accurate secondary- and tertiary-structure predictions and could inform protein function analysis and drug design through better structural representations.

Abstract

In this study, we tackle the challenging task of predicting secondary structures from protein primary sequences, a pivotal initial stride towards predicting tertiary structures, while yielding crucial insights into protein activity, relationships, and functions. Existing methods often utilize extensive sets of unlabeled amino acid sequences. However, these approaches neither explicitly capture nor harness the accessible protein 3D structural data, which is recognized as a decisive factor in dictating protein functions. To address this, we utilize protein residue graphs and introduce various forms of sequential or structural connections to capture enhanced spatial information. We adeptly combine Graph Neural Networks (GNNs) and Language Models (LMs), specifically utilizing a pre-trained transformer-based protein language model to encode amino acid sequences and employing message-passing mechanisms like GCN and R-GCN to capture geometric characteristics of protein structures. Employing convolution within a specific node's nearby region, including relations, we stack multiple convolutional layers to efficiently learn combined insights from the protein's spatial graph, revealing intricate interconnections and dependencies in its structural arrangement. To assess our model's performance, we employed the training dataset provided by NetSurfP-2.0, which outlines secondary structure in 3-and 8-states. Extensive experiments show that our proposed model, SSRGNet surpasses the baseline on f1-scores.

Protein Secondary Structure Prediction Using 3D Graphs and Relation-Aware Message Passing Transformers

TL;DR

Abstract

Protein Secondary Structure Prediction Using 3D Graphs and Relation-Aware Message Passing Transformers

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)