Table of Contents
Fetching ...

Sequence-Only Prediction of Binding Affinity Changes: A Robust and Interpretable Model for Antibody Engineering

Chen Liu, Mingchen Li, Yang Tan, Wenrui Gou, Guisheng Fan, Bingxin Zhou

TL;DR

This paper tackles predicting mutation-induced binding affinity changes in antibody–antigen complexes without relying on structural data. It introduces ProtAttBA, a cross-attention model that uses frozen pre-trained protein language models to encode four sequences (wt and mutant antibody and antigen) and regresses $\Delta\Delta G_{bind}$ from their representations. Across three open benchmarks and multiple data-splits, ProtAttBA achieves competitive accuracy compared with both sequence- and structure-based baselines, showing robustness when input structures are uncertain. The method also provides interpretability through attention weights that highlight residues contributing to affinity changes, suggesting practical utility for rapid and cost-effective antibody engineering.

Abstract

A pivotal area of research in antibody engineering is to find effective modifications that enhance antibody-antigen binding affinity. Traditional wet-lab experiments assess mutants in a costly and time-consuming manner. Emerging deep learning solutions offer an alternative by modeling antibody structures to predict binding affinity changes. However, they heavily depend on high-quality complex structures, which are frequently unavailable in practice. Therefore, we propose ProtAttBA, a deep learning model that predicts binding affinity changes based solely on the sequence information of antibody-antigen complexes. ProtAttBA employs a pre-training phase to learn protein sequence patterns, following a supervised training phase using labeled antibody-antigen complex data to train a cross-attention-based regressor for predicting binding affinity changes. We evaluated ProtAttBA on three open benchmarks under different conditions. Compared to both sequence- and structure-based prediction methods, our approach achieves competitive performance, demonstrating notable robustness, especially with uncertain complex structures. Notably, our method possesses interpretability from the attention mechanism. We show that the learned attention scores can identify critical residues with impacts on binding affinity. This work introduces a rapid and cost-effective computational tool for antibody engineering, with the potential to accelerate the development of novel therapeutic antibodies.

Sequence-Only Prediction of Binding Affinity Changes: A Robust and Interpretable Model for Antibody Engineering

TL;DR

This paper tackles predicting mutation-induced binding affinity changes in antibody–antigen complexes without relying on structural data. It introduces ProtAttBA, a cross-attention model that uses frozen pre-trained protein language models to encode four sequences (wt and mutant antibody and antigen) and regresses from their representations. Across three open benchmarks and multiple data-splits, ProtAttBA achieves competitive accuracy compared with both sequence- and structure-based baselines, showing robustness when input structures are uncertain. The method also provides interpretability through attention weights that highlight residues contributing to affinity changes, suggesting practical utility for rapid and cost-effective antibody engineering.

Abstract

A pivotal area of research in antibody engineering is to find effective modifications that enhance antibody-antigen binding affinity. Traditional wet-lab experiments assess mutants in a costly and time-consuming manner. Emerging deep learning solutions offer an alternative by modeling antibody structures to predict binding affinity changes. However, they heavily depend on high-quality complex structures, which are frequently unavailable in practice. Therefore, we propose ProtAttBA, a deep learning model that predicts binding affinity changes based solely on the sequence information of antibody-antigen complexes. ProtAttBA employs a pre-training phase to learn protein sequence patterns, following a supervised training phase using labeled antibody-antigen complex data to train a cross-attention-based regressor for predicting binding affinity changes. We evaluated ProtAttBA on three open benchmarks under different conditions. Compared to both sequence- and structure-based prediction methods, our approach achieves competitive performance, demonstrating notable robustness, especially with uncertain complex structures. Notably, our method possesses interpretability from the attention mechanism. We show that the learned attention scores can identify critical residues with impacts on binding affinity. This work introduces a rapid and cost-effective computational tool for antibody engineering, with the potential to accelerate the development of novel therapeutic antibodies.

Paper Structure

This paper contains 15 sections, 7 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of the ProtAttBA architecture. The model predicts changes in antigen–antibody binding affinity ($\Delta\Delta G$) by amino acid mutations. Given wild-type and mutant sequence pairs, ProtAttBA first encodes antibody and antigen sequences using a frozen pre-trained protein language model to generate contextualized residue embeddings $\{\mathbf{H}_{\rm ab}^{\rm wt}, \mathbf{H}_{\rm ag}^{\rm wt}, \mathbf{H}_{\rm ab}^{\rm mt}, \mathbf{H}_{\rm ag}^{\rm mt}\}$. The attention module then applies convolutional neural networks with dual multi-head cross-attention to yield refined representations $\{\mathbf{H}_{\rm ab}^{\rm wt^\prime}, \mathbf{H}_{\rm ag}^{\rm wt^\prime}, \mathbf{H}_{\rm ab}^{\rm mt^\prime}, \mathbf{H}_{\rm ag}^{\rm mt^\prime}\}$ and the corresponding pooled feature vectors $\{\mathbf{f}_{\rm ab}^{\rm wt}, \mathbf{f}_{\rm ag}^{\rm wt}, \mathbf{f}_{\rm ab}^{\rm mt}, \mathbf{f}_{\rm ag}^{\rm mt}\}$ (see Sections \ref{['1_subec2.2']} and \ref{['2_subec2.2']}). Finally, the prediction module concatenates wild-type and mutant features and regresses the $\Delta\Delta G$ value.
  • Figure 2: Ablative comparison of attention-based and MLP-based ProtAttBA by PCC (top) and $R^2$ (bottom) on the prediction performance on the three benchmark datasets.
  • Figure 3: Protein structure visualization for interpretability analysis. Panels a, b, c, and d depict localized views of the antibody-antigen complex at the mutation site, before and after mutation, respectively. The antigen chain is highlighted in green. Panels e and f illustrate the attention weight matrices learned by the model, where cooler colors (tending towards blue) indicate regions where the model assigns higher importance to interactions between the mutated residue and the current position.