Table of Contents
Fetching ...

AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction

Chunan Liu, Lilian Denzler, Yihong Chen, Andrew Martin, Brooks Paige

TL;DR

A novel method is proposed, WALLE, which leverages both unstructured modeling from protein language models and structural modeling from graph neural networks and leverages both unstructured modeling from protein language models and structural modeling from graph neural networks to improve epitope prediction.

Abstract

Epitope identification is vital for antibody design yet challenging due to the inherent variability in antibodies. While many deep learning methods have been developed for general protein binding site prediction tasks, whether they work for epitope prediction remains an understudied research question. The challenge is also heightened by the lack of a consistent evaluation pipeline with sufficient dataset size and epitope diversity. We introduce a filtered antibody-antigen complex structure dataset, AsEP (Antibody-specific Epitope Prediction). AsEP is the largest of its kind and provides clustered epitope groups, allowing the community to develop and test novel epitope prediction methods and evaluate their generalisability. AsEP comes with an easy-to-use interface in Python and pre-built graph representations of each antibody-antigen complex while also supporting customizable embedding methods. Using this new dataset, we benchmark several representative general protein-binding site prediction methods and find that their performances fall short of expectations for epitope prediction. To address this, we propose a novel method, WALLE, which leverages both unstructured modeling from protein language models and structural modeling from graph neural networks. WALLE demonstrate up to 3-10X performance improvement over the baseline methods. Our empirical findings suggest that epitope prediction benefits from combining sequential features provided by language models with geometrical information from graph representations. This provides a guideline for future epitope prediction method design. In addition, we reformulate the task as bipartite link prediction, allowing convenient model performance attribution and interpretability. We open source our data and code at https://github.com/biochunan/AsEP-dataset.

AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction

TL;DR

A novel method is proposed, WALLE, which leverages both unstructured modeling from protein language models and structural modeling from graph neural networks and leverages both unstructured modeling from protein language models and structural modeling from graph neural networks to improve epitope prediction.

Abstract

Epitope identification is vital for antibody design yet challenging due to the inherent variability in antibodies. While many deep learning methods have been developed for general protein binding site prediction tasks, whether they work for epitope prediction remains an understudied research question. The challenge is also heightened by the lack of a consistent evaluation pipeline with sufficient dataset size and epitope diversity. We introduce a filtered antibody-antigen complex structure dataset, AsEP (Antibody-specific Epitope Prediction). AsEP is the largest of its kind and provides clustered epitope groups, allowing the community to develop and test novel epitope prediction methods and evaluate their generalisability. AsEP comes with an easy-to-use interface in Python and pre-built graph representations of each antibody-antigen complex while also supporting customizable embedding methods. Using this new dataset, we benchmark several representative general protein-binding site prediction methods and find that their performances fall short of expectations for epitope prediction. To address this, we propose a novel method, WALLE, which leverages both unstructured modeling from protein language models and structural modeling from graph neural networks. WALLE demonstrate up to 3-10X performance improvement over the baseline methods. Our empirical findings suggest that epitope prediction benefits from combining sequential features provided by language models with geometrical information from graph representations. This provides a guideline for future epitope prediction method design. In addition, we reformulate the task as bipartite link prediction, allowing convenient model performance attribution and interpretability. We open source our data and code at https://github.com/biochunan/AsEP-dataset.
Paper Structure (35 sections, 7 equations, 6 figures, 11 tables)

This paper contains 35 sections, 7 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: An example illustrating interacting residues. The two dashed lines indicate distances between non-hydrogen atoms from different interacting residues across two protein chains, with each chain's carbon atoms colored cyan and green.
  • Figure 2: Graph visualization of an antibody-antigen complex. Top: the molecular structure of an antibody complexed with the receptor binding domain of SARS-Cov-2 virus (PDB code: 7KFW), the antigen. Spheres indicate the alpha carbon atoms of each amino acid. Color scheme: the antigen is colored in magenta, the framework region of the heavy and light chains is colored in green and cyan and CDR 1-3 loops are colored in blue, yellow, and red, respectively. Bottom: the corresponding graph. Green vertices are antibody CDR residues and pink vertices are antigen surface residues.
  • Figure 3: A schematic of the preprocessing step that turns an input antibody-antigen complex structure into a graph pair and the model architecture of WALLE.
  • Figure S1: Pipeline to convert an antibody-antigen complex structure into a graph representation.
  • Figure S2: Blue line: distribution of the number of residue-residue contacts in antibody-antigen interface across the dataset with a mean and median of 43.27 and 43.00, respectively. Red line: fitted normal distribution with mean and standard deviation of 43.27 and 10.80, respectively.
  • ...and 1 more figures