Table of Contents
Fetching ...

ProtAlign: Contrastive learning paradigm for Sequence and structure alignment

Aditya Ranganath, Hasin Us Sami, Kowshik Thopalli, Bhavya Kailkhura, Wesam Sakla

TL;DR

This paper introduces a sequence structure contrastive alignment framework, which learns a shared embedding space where proteins are represented consistently across modalities, and maximizes agreement between matched sequence structure pairs while pushing apart unrelated pairs.

Abstract

Protein language models often take into consideration the alignment between a protein sequence and its textual description. However, they do not take structural information into consideration. Traditional methods treat sequence and structure separately, limiting the ability to exploit the alignment between the structure and protein sequence embeddings. In this paper, we introduce a sequence structure contrastive alignment framework, which learns a shared embedding space where proteins are represented consistently across modalities. By training on large-scale pairs of sequences and experimentally resolved or predicted structures, the model maximizes agreement between matched sequence structure pairs while pushing apart unrelated pairs. This alignment enables cross-modal retrieval (e.g., finding structural neighbors given a sequence), improves downstream prediction tasks such as function annotation and stability estimation, and provides interpretable links between sequence variation and structural organization. Our results demonstrate that contrastive learning can serve as a powerful bridge between protein sequences and structures, offering a unified representation for understanding and engineering proteins.

ProtAlign: Contrastive learning paradigm for Sequence and structure alignment

TL;DR

This paper introduces a sequence structure contrastive alignment framework, which learns a shared embedding space where proteins are represented consistently across modalities, and maximizes agreement between matched sequence structure pairs while pushing apart unrelated pairs.

Abstract

Protein language models often take into consideration the alignment between a protein sequence and its textual description. However, they do not take structural information into consideration. Traditional methods treat sequence and structure separately, limiting the ability to exploit the alignment between the structure and protein sequence embeddings. In this paper, we introduce a sequence structure contrastive alignment framework, which learns a shared embedding space where proteins are represented consistently across modalities. By training on large-scale pairs of sequences and experimentally resolved or predicted structures, the model maximizes agreement between matched sequence structure pairs while pushing apart unrelated pairs. This alignment enables cross-modal retrieval (e.g., finding structural neighbors given a sequence), improves downstream prediction tasks such as function annotation and stability estimation, and provides interpretable links between sequence variation and structural organization. Our results demonstrate that contrastive learning can serve as a powerful bridge between protein sequences and structures, offering a unified representation for understanding and engineering proteins.
Paper Structure (5 sections, 3 equations, 4 figures, 3 tables)

This paper contains 5 sections, 3 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: ProtAlign. Fig. (a) shows our proposed model consisting of multi-head self-attention (MSA) layer and LayerNorm layer, with the learnable token as queries, structure/sequence embeddings as keys and values within the MSA layer. Fig. (b) shows the alignment protocol for training our model.
  • Figure 2: The figure presents the t-SNE plot. Fig. (a) presents the 2D projection of the embeddings from both the protein and structure embeddings pre-training while Fig. (b) shows the t-SNE plot post-training.
  • Figure 3: Training loss over epochs for CLIP vs. SigLIP.
  • Figure 4: Heatmap visualization of cosine similarity between all possible sequence-structure pairs ($\tau=0.07$). $y$-axis represents the sequences and $x$-axis represents the structures.