Table of Contents
Fetching ...

Exploiting Hierarchical Interactions for Protein Surface Learning

Yiqun Lin, Liang Pan, Yi Li, Ziwei Liu, Xiaomeng Li

TL;DR

This work tackles protein interaction prediction by jointly modeling chemical and geometric information on protein surfaces. It introduces HCGNet, a dual-branch network with a Chemical Feature Propagation module that enforces hierarchical interactions between atoms/residues and surface points, enabling effective multi-scale feature fusion. Empirically, HCGNet achieves state-of-the-art performance on site prediction and interaction matching, outperforming the previous best by about 2.3% and 3.2% in ROC-AUC, respectively. The framework offers a general, scalable approach to biomolecular surface learning with potential applicability to other biomolecules and downstream tasks.

Abstract

Predicting interactions between proteins is one of the most important yet challenging problems in structural bioinformatics. Intrinsically, potential function sites in protein surfaces are determined by both geometric and chemical features. However, existing works only consider handcrafted or individually learned chemical features from the atom type and extract geometric features independently. Here, we identify two key properties of effective protein surface learning: 1) relationship among atoms: atoms are linked with each other by covalent bonds to form biomolecules instead of appearing alone, leading to the significance of modeling the relationship among atoms in chemical feature learning. 2) hierarchical feature interaction: the neighboring residue effect validates the significance of hierarchical feature interaction among atoms and between surface points and atoms (or residues). In this paper, we present a principled framework based on deep learning techniques, namely Hierarchical Chemical and Geometric Feature Interaction Network (HCGNet), for protein surface analysis by bridging chemical and geometric features with hierarchical interactions. Extensive experiments demonstrate that our method outperforms the prior state-of-the-art method by 2.3% in site prediction task and 3.2% in interaction matching task, respectively. Our code is available at https://github.com/xmed-lab/HCGNet.

Exploiting Hierarchical Interactions for Protein Surface Learning

TL;DR

This work tackles protein interaction prediction by jointly modeling chemical and geometric information on protein surfaces. It introduces HCGNet, a dual-branch network with a Chemical Feature Propagation module that enforces hierarchical interactions between atoms/residues and surface points, enabling effective multi-scale feature fusion. Empirically, HCGNet achieves state-of-the-art performance on site prediction and interaction matching, outperforming the previous best by about 2.3% and 3.2% in ROC-AUC, respectively. The framework offers a general, scalable approach to biomolecular surface learning with potential applicability to other biomolecules and downstream tasks.

Abstract

Predicting interactions between proteins is one of the most important yet challenging problems in structural bioinformatics. Intrinsically, potential function sites in protein surfaces are determined by both geometric and chemical features. However, existing works only consider handcrafted or individually learned chemical features from the atom type and extract geometric features independently. Here, we identify two key properties of effective protein surface learning: 1) relationship among atoms: atoms are linked with each other by covalent bonds to form biomolecules instead of appearing alone, leading to the significance of modeling the relationship among atoms in chemical feature learning. 2) hierarchical feature interaction: the neighboring residue effect validates the significance of hierarchical feature interaction among atoms and between surface points and atoms (or residues). In this paper, we present a principled framework based on deep learning techniques, namely Hierarchical Chemical and Geometric Feature Interaction Network (HCGNet), for protein surface analysis by bridging chemical and geometric features with hierarchical interactions. Extensive experiments demonstrate that our method outperforms the prior state-of-the-art method by 2.3% in site prediction task and 3.2% in interaction matching task, respectively. Our code is available at https://github.com/xmed-lab/HCGNet.
Paper Structure (18 sections, 13 equations, 8 figures, 6 tables)

This paper contains 18 sections, 13 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: (a-d) show a protein's amino acid sequence, surface, atoms, and cartoon structure (a simplified representation based on the secondary structure), respectively. (a) Due to the neighboring residue effect wang2002investigation, multiscale relationships among atoms and between surface points and atoms should be considered in protein function analysis. (b-d) Our key idea is to model the hierarchical feature interactions between chemical (atom/residue) and geometric (surface) features for efficient protein surface learning.
  • Figure 2: A protein can be represented by three pointsets, including surface points for geometric shape analysis, and atom and residue points for chemical property analysis.
  • Figure 3: (a) In SA module, a set of neighbor points are grouped and followed by MLPs and max pooling for feature aggregation. (b) In residual SA module, SA is firstly used to extract the local feature for the centroid point, and then the residual connection is applied by adding input and output features. The linear layer is not necessary when $C$ is equal to $C'$.
  • Figure 4: The backbone of HCGNet is mainly composed of two branches for geometric (top) and chemical (bottom) feature learning. Two branches encode features from surface points and atom points in a multiscale way, respectively. Chemical feature propagation modules (middle) are proposed to propagate features from the chemical branch to the geometric branch also in a hierarchical way. Moreover, different task-oriented heads can be followed to handle different downstream tasks, such as site prediction and interaction matching. "d.s." indicates point cloud downsampling. $r_s$ and $r_a$ are the initial query radii for two branches, respectively.
  • Figure 5: Implementation details of feature aggregation function $g(\cdot)$ in CFP. $g(\cdot)$ operates on each surface point to query chemical features from neighbor atoms (near) and residues (far). We use MLPs to transform features and summation to aggregate features from neighbors.
  • ...and 3 more figures