Table of Contents
Fetching ...

CCPL: Cross-modal Contrastive Protein Learning

Jiangbin Zheng, Stan Z. Li

TL;DR

This work introduces a novel unsupervised protein structure representation pretraining method, cross-modal contrastive protein learning (CCPL), which leverages a robust protein language model and uses unsupervised contrastive alignment to enhance structure learning, incorporating self-supervised structural constraints to maintain intrinsic structural information.

Abstract

Effective protein representation learning is crucial for predicting protein functions. Traditional methods often pretrain protein language models on large, unlabeled amino acid sequences, followed by finetuning on labeled data. While effective, these methods underutilize the potential of protein structures, which are vital for function determination. Common structural representation techniques rely heavily on annotated data, limiting their generalizability. Moreover, structural pretraining methods, similar to natural language pretraining, can distort actual protein structures. In this work, we introduce a novel unsupervised protein structure representation pretraining method, cross-modal contrastive protein learning (CCPL). CCPL leverages a robust protein language model and uses unsupervised contrastive alignment to enhance structure learning, incorporating self-supervised structural constraints to maintain intrinsic structural information. We evaluated our model across various benchmarks, demonstrating the framework's superiority.

CCPL: Cross-modal Contrastive Protein Learning

TL;DR

This work introduces a novel unsupervised protein structure representation pretraining method, cross-modal contrastive protein learning (CCPL), which leverages a robust protein language model and uses unsupervised contrastive alignment to enhance structure learning, incorporating self-supervised structural constraints to maintain intrinsic structural information.

Abstract

Effective protein representation learning is crucial for predicting protein functions. Traditional methods often pretrain protein language models on large, unlabeled amino acid sequences, followed by finetuning on labeled data. While effective, these methods underutilize the potential of protein structures, which are vital for function determination. Common structural representation techniques rely heavily on annotated data, limiting their generalizability. Moreover, structural pretraining methods, similar to natural language pretraining, can distort actual protein structures. In this work, we introduce a novel unsupervised protein structure representation pretraining method, cross-modal contrastive protein learning (CCPL). CCPL leverages a robust protein language model and uses unsupervised contrastive alignment to enhance structure learning, incorporating self-supervised structural constraints to maintain intrinsic structural information. We evaluated our model across various benchmarks, demonstrating the framework's superiority.
Paper Structure (19 sections, 7 equations, 7 figures, 3 tables)

This paper contains 19 sections, 7 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: (a) The proposed cross-modal contrastive learning framework utilizes a pretrained protein language model to guide the training of the protein structure model through contrastive alignment loss. To reinforce information constraints on the structure, we introduce a self-supervised contact map prediction. (b) The internal and external evaluation tasks for our trained structure model during inference phase.
  • Figure 2: Schematic diagram of the GVP module: Protein backbone atoms (C, C$_\alpha$, and N) form the basis for generating graphs with node and edge features based on k-nearest neighbor relationships. These graphs are fed into the vector and scalar channels of the GVP module to produce vector and scalar features. These primary features are then enhanced with additional spatial features, including rotation frame, sidechain, orientation, and dihedral characteristics, to create comprehensive spatial structure features.
  • Figure 3: Various alignment levels. (a) Residue-level alignment entails comparing each pair of structure-sequence features residue by residue. (b) Protein-level alignment involves comparing each pair of structure-sequence features protein by protein. The features of each protein are amalgamated from all the residue features contained within it. Identical colors indicate a sequence-structure pair originating from the same protein.
  • Figure 4: Pipeline for reconstructing the contact map based on C$_\beta$ atoms with length $L$: First, attention maps are extracted from each layer of the self-attention blocks. These maps undergo symmetrization and average product correction (APC) along the amino acid dimensions to produce an $L \times L$ coupling matrix. This matrix forms the basis for the final contact map predictions, which are refined using a regression layer.
  • Figure 5: Protein functional prediction tasks. (a) Comparing the fold-level predictions. P@k denotes the top-k precision. (b) Performances for enzyme recognition task.
  • ...and 2 more figures