Table of Contents
Fetching ...

HeMeNet: Heterogeneous Multichannel Equivariant Network for Protein Multitask Learning

Rong Han, Wenbing Huang, Lingxiao Luo, Xinyan Han, Jiaming Shen, Zhiqiang Zhang, Jun Zhou, Ting Chen

TL;DR

The paper tackles data sparsity in structure-based protein tasks by introducing Protein-MT, a six-task benchmark that combines LBA, PPA, EC, and GO data. It proposes HeMeNet, an $E(3)$-equivariant, heterogeneous multichannel GNN with a task-aware readout that enables joint learning across diverse inputs and tasks. Across single-task and multi-task settings, HeMeNet achieves state-of-the-art results on most tasks, with substantial cross-task gains in affinity predictions, illustrating effective knowledge transfer between structure-based properties and binding affinities. The work provides a scalable, generalist framework for multitask protein learning and paves the way for more integrated structure- and function-aware drug discovery pipelines.

Abstract

Understanding and leveraging the 3D structures of proteins is central to a variety of biological and drug discovery tasks. While deep learning has been applied successfully for structure-based protein function prediction tasks, current methods usually employ distinct training for each task. However, each of the tasks is of small size, and such a single-task strategy hinders the models' performance and generalization ability. As some labeled 3D protein datasets are biologically related, combining multi-source datasets for larger-scale multi-task learning is one way to overcome this problem. In this paper, we propose a neural network model to address multiple tasks jointly upon the input of 3D protein structures. In particular, we first construct a standard structure-based multi-task benchmark called Protein-MT, consisting of 6 biologically relevant tasks, including affinity prediction and property prediction, integrated from 4 public datasets. Then, we develop a novel graph neural network for multi-task learning, dubbed Heterogeneous Multichannel Equivariant Network (HeMeNet), which is E(3) equivariant and able to capture heterogeneous relationships between different atoms. Besides, HeMeNet can achieve task-specific learning via the task-aware readout mechanism. Extensive evaluations on our benchmark verify the effectiveness of multi-task learning, and our model generally surpasses state-of-the-art models.

HeMeNet: Heterogeneous Multichannel Equivariant Network for Protein Multitask Learning

TL;DR

The paper tackles data sparsity in structure-based protein tasks by introducing Protein-MT, a six-task benchmark that combines LBA, PPA, EC, and GO data. It proposes HeMeNet, an -equivariant, heterogeneous multichannel GNN with a task-aware readout that enables joint learning across diverse inputs and tasks. Across single-task and multi-task settings, HeMeNet achieves state-of-the-art results on most tasks, with substantial cross-task gains in affinity predictions, illustrating effective knowledge transfer between structure-based properties and binding affinities. The work provides a scalable, generalist framework for multitask protein learning and paves the way for more integrated structure- and function-aware drug discovery pipelines.

Abstract

Understanding and leveraging the 3D structures of proteins is central to a variety of biological and drug discovery tasks. While deep learning has been applied successfully for structure-based protein function prediction tasks, current methods usually employ distinct training for each task. However, each of the tasks is of small size, and such a single-task strategy hinders the models' performance and generalization ability. As some labeled 3D protein datasets are biologically related, combining multi-source datasets for larger-scale multi-task learning is one way to overcome this problem. In this paper, we propose a neural network model to address multiple tasks jointly upon the input of 3D protein structures. In particular, we first construct a standard structure-based multi-task benchmark called Protein-MT, consisting of 6 biologically relevant tasks, including affinity prediction and property prediction, integrated from 4 public datasets. Then, we develop a novel graph neural network for multi-task learning, dubbed Heterogeneous Multichannel Equivariant Network (HeMeNet), which is E(3) equivariant and able to capture heterogeneous relationships between different atoms. Besides, HeMeNet can achieve task-specific learning via the task-aware readout mechanism. Extensive evaluations on our benchmark verify the effectiveness of multi-task learning, and our model generally surpasses state-of-the-art models.
Paper Structure (32 sections, 11 equations, 5 figures, 6 tables)

This paper contains 32 sections, 11 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Comparison of different models with tasks. Full-atom models (left) predict binding affinity with interface information; Alpha-Carbon models (right) predict protein functions with chain information. They need to be retrained for each task. HeMeNet (middle) supports various full-atom input information and predicts all six tasks simultaneously. We omit the edges for simplicity.
  • Figure 2: Construction of Protein-MT. We first extract the UniProt ID for each chain and construct a UniProt-Property dictionary to map the UniProt ID of each protein chain with EC and GO-MF, GO-BP, GO-CC labels annotated in the EC and GO datasets. With this dictionary, we can extract each chain's UniProt ID and map it with its labels. The complex with one affinity label and all property labels for each chain is defined as fully-labeled. We take most of the fully-labeled data for val/test and most of the partially labeled data for training.
  • Figure 3: Overview of our pipeline. Left: HeMeNet takes two-instance complexes or a single chain as input and predicts complex-level affinity and chain-level properties simultaneously. Middle: An example of the heterogeneous graph and the relational equivariant message passing. We only annotate a small part of our multi-channel full-atom graph for simplicity. Each edge is bidirectional, and we only mark the incoming edge arrow and self-loop for the center node. Right: Task-aware readout module. We take a task prompt as the query for each task, generating attention maps for all the nodes to get a multi-level readout for different downstream tasks.
  • Figure 4: PPA performance
  • Figure 5: Prompt correlation