Table of Contents
Fetching ...

Transformers in Protein: A Survey

Xiaowen Ling, Zhiqiang Li, Yanbin Wang, Zhuhong You

TL;DR

This survey addresses the growing role of Transformer models in protein informatics by organizing over 100 studies into four core domains: structure prediction, function annotation, PPIs, and drug discovery. It synthesizes architectural innovations (self-attention, multi-head mechanisms, and SSL-based pretraining), protein-specific derivatives, and multimodal integrations, while compiling essential datasets and open-source resources to support reproducibility. The authors identify persistent challenges—computational scalability, data quality, interpretability, and cross-domain generalization—and propose directions such as hybrid physics-informed modeling, multi-modal data fusion, and standardized benchmarks. By detailing practical progress and offering a consolidated roadmap, the paper highlights how Transformer-based approaches can accelerate protein science and therapeutic discovery in a data-rich, interdisciplinary era.

Abstract

As protein informatics advances rapidly, the demand for enhanced predictive accuracy, structural analysis, and functional understanding has intensified. Transformer models, as powerful deep learning architectures, have demonstrated unprecedented potential in addressing diverse challenges across protein research. However, a comprehensive review of Transformer applications in this field remains lacking. This paper bridges this gap by surveying over 100 studies, offering an in-depth analysis of practical implementations and research progress of Transformers in protein-related tasks. Our review systematically covers critical domains, including protein structure prediction, function prediction, protein-protein interaction analysis, functional annotation, and drug discovery/target identification. To contextualize these advancements across various protein domains, we adopt a domain-oriented classification system. We first introduce foundational concepts: the Transformer architecture and attention mechanisms, categorize Transformer variants tailored for protein science, and summarize essential protein knowledge. For each research domain, we outline its objectives and background, critically evaluate prior methods and their limitations, and highlight transformative contributions enabled by Transformer models. We also curate and summarize pivotal datasets and open-source code resources to facilitate reproducibility and benchmarking. Finally, we discuss persistent challenges in applying Transformers to protein informatics and propose future research directions. This review aims to provide a consolidated foundation for the synergistic integration of Transformer and protein informatics, fostering further innovation and expanded applications in the field.

Transformers in Protein: A Survey

TL;DR

This survey addresses the growing role of Transformer models in protein informatics by organizing over 100 studies into four core domains: structure prediction, function annotation, PPIs, and drug discovery. It synthesizes architectural innovations (self-attention, multi-head mechanisms, and SSL-based pretraining), protein-specific derivatives, and multimodal integrations, while compiling essential datasets and open-source resources to support reproducibility. The authors identify persistent challenges—computational scalability, data quality, interpretability, and cross-domain generalization—and propose directions such as hybrid physics-informed modeling, multi-modal data fusion, and standardized benchmarks. By detailing practical progress and offering a consolidated roadmap, the paper highlights how Transformer-based approaches can accelerate protein science and therapeutic discovery in a data-rich, interdisciplinary era.

Abstract

As protein informatics advances rapidly, the demand for enhanced predictive accuracy, structural analysis, and functional understanding has intensified. Transformer models, as powerful deep learning architectures, have demonstrated unprecedented potential in addressing diverse challenges across protein research. However, a comprehensive review of Transformer applications in this field remains lacking. This paper bridges this gap by surveying over 100 studies, offering an in-depth analysis of practical implementations and research progress of Transformers in protein-related tasks. Our review systematically covers critical domains, including protein structure prediction, function prediction, protein-protein interaction analysis, functional annotation, and drug discovery/target identification. To contextualize these advancements across various protein domains, we adopt a domain-oriented classification system. We first introduce foundational concepts: the Transformer architecture and attention mechanisms, categorize Transformer variants tailored for protein science, and summarize essential protein knowledge. For each research domain, we outline its objectives and background, critically evaluate prior methods and their limitations, and highlight transformative contributions enabled by Transformer models. We also curate and summarize pivotal datasets and open-source code resources to facilitate reproducibility and benchmarking. Finally, we discuss persistent challenges in applying Transformers to protein informatics and propose future research directions. This review aims to provide a consolidated foundation for the synergistic integration of Transformer and protein informatics, fostering further innovation and expanded applications in the field.

Paper Structure

This paper contains 25 sections, 3 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Analysis of Transformer models in protein research using data from the Web of Science Core Collection. (a) Counts the number of publications and citations over the past four years, highlighting the growing influence of this research area. (b) Shows the distribution of publications across prestigious journals such as Nature, Science, and Cell. (c) Provides an overview of publications categorized by subfields from 2022 to 2025 YTD, illustrating the diverse applications of Transformer models in protein.
  • Figure 2: Architecture of the Transformer Modela1Originally proposed for machine translation tasks, the Transformer model transforms a source-language input sequence into a target-language output sequence through a dual-path architecture: (1) The encoder pathway processes input tokens via embedding projection and N identical blocks containing multi-head attention and feed-forward layers to generate continuous representations, while (2) the decoder pathway autoregressively produces output tokens by jointly attending to both the encoded source sequence and its own right-shifted inputs - where target sequences are shifted rightward with prepended ⟨SOS⟩ tokens during training to prevent trivial copying, while the loss is computed against the original sequence appended with ⟨EOS⟩ tokens. Both encoder and decoder stacks employ N modularized layers integrating multi-head attention mechanisms, position-wise feed-forward networks, and residual connections with layer normalization, enabling effective modeling of cross-lingual structural dependencies.
  • Figure 3: Hierarchical Classification of Self-Attention Mechanisms in Protein Models. This diagram illustrates the categorization of self-attention mechanisms adapted from vision transformer literature. The top-level distinction is made between Single-Head Self-Attention and Multi-Head Self-Attention. Under Multi-Head Self-Attention, several extensions are shown, including standard Transformer models, Pre-trained Transformers such as ESM and ProtBERT, Adapter-based Transformers, and Multi-modal Attention mechanisms that integrate sequence and structure or sequence and function. Additional branches include Spatial Self-Attention, which encompasses Graph-based attention and SE(3)-Equivariant Attention, and Hybrid Attention, which combines CNN with Transformer architectures.
  • Figure 4: Illustrative diagram of three fundamental supervised learning tasks. Supervised learning in machine learning (ML) is typically categorized into classification and regression tasks. While two-dimensional representations are used here for conceptual clarity, real-world datasets often reside in high-dimensional feature spaces. (a) In binary classification, each sample belongs to one of two possible categories. For instance, a model may classify protein variants as either stable or unstablea199, or determine whether a protein is a G-protein-coupled receptor or not, based on sequence-derived features and machine learning modelsa200. (b) Multi-class classification involves assigning samples to one of several discrete classes. For example, recent studies have developed machine learning models to predict the subcellular localization of human proteins—such as nucleus, cytoplasm, mitochondria, and extracellular regions—based on features extracted from immunohistochemistry imagesa201. (c) In regression tasks, the goal is to predict continuous numerical properties of proteins. For instance, recent models have been developed to estimate protein solubility levels directly from sequence-derived or structural features, enabling fine-grained prediction beyond binary soluble/insoluble classificationa202.
  • Figure 5: The structure of the BERT model. This figure shows the Bidirectional Encoder Representations from Transformers (BERT) architecture, highlighting both pre-training and fine-tuning phases.