Table of Contents
Fetching ...

A unified cross-attention model for predicting antigen binding specificity to both HLA and TCR molecules

Chenpeng Yu, Xing Fang, Hui Liu

TL;DR

This work proposes a deep learning model based on the cross-attention mechanism to simultaneously predict peptide–HLA and peptide–TCR bindings, providing more comprehensive evaluation of antigen immunogenicity.

Abstract

The immune checkpoint inhibitors have demonstrated promising clinical efficacy across various tumor types, yet the percentage of patients who benefit from them remains low. The bindings between tumor antigens and HLA-I/TCR molecules determine the antigen presentation and T-cell activation, thereby playing an important role in the immunotherapy response. In this paper, we propose UnifyImmun, a unified cross-attention transformer model designed to simultaneously predict the bindings of peptides to both receptors, providing more comprehensive evaluation of antigen immunogenicity. We devise a two-phase strategy using virtual adversarial training that enables these two tasks to reinforce each other mutually, by compelling the encoders to extract more expressive features. Our method demonstrates superior performance in predicting both pHLA and pTCR binding on multiple independent and external test sets. Notably, on a large-scale COVID-19 pTCR binding test set without any seen peptide in training set, our method outperforms the current state-of-the-art methods by more than 10\%. The predicted binding scores significantly correlate with the immunotherapy response and clinical outcomes on two clinical cohorts. Furthermore, the cross-attention scores and integrated gradients reveal the amino-acid sites critical for peptide binding to receptors. In essence, our approach marks a significant step toward comprehensive evaluation of antigen immunogenicity.

A unified cross-attention model for predicting antigen binding specificity to both HLA and TCR molecules

TL;DR

This work proposes a deep learning model based on the cross-attention mechanism to simultaneously predict peptide–HLA and peptide–TCR bindings, providing more comprehensive evaluation of antigen immunogenicity.

Abstract

The immune checkpoint inhibitors have demonstrated promising clinical efficacy across various tumor types, yet the percentage of patients who benefit from them remains low. The bindings between tumor antigens and HLA-I/TCR molecules determine the antigen presentation and T-cell activation, thereby playing an important role in the immunotherapy response. In this paper, we propose UnifyImmun, a unified cross-attention transformer model designed to simultaneously predict the bindings of peptides to both receptors, providing more comprehensive evaluation of antigen immunogenicity. We devise a two-phase strategy using virtual adversarial training that enables these two tasks to reinforce each other mutually, by compelling the encoders to extract more expressive features. Our method demonstrates superior performance in predicting both pHLA and pTCR binding on multiple independent and external test sets. Notably, on a large-scale COVID-19 pTCR binding test set without any seen peptide in training set, our method outperforms the current state-of-the-art methods by more than 10\%. The predicted binding scores significantly correlate with the immunotherapy response and clinical outcomes on two clinical cohorts. Furthermore, the cross-attention scores and integrated gradients reveal the amino-acid sites critical for peptide binding to receptors. In essence, our approach marks a significant step toward comprehensive evaluation of antigen immunogenicity.
Paper Structure (18 sections, 2 equations, 6 figures)

This paper contains 18 sections, 2 equations, 6 figures.

Figures (6)

  • Figure 1: Illustrative diagram of UnifyImmun framework and two-phase training strategy, as well as the sequence frequency distributions of the benchmark datasets. (a) Architecture of UnifyImmun based on cross-attention mechanism. (b) Two-stage progressive training strategy. (c-d) Frequency of antigen sequences and TCR CDR3 sequences included in our created benchmark datasets with respect to lengths.
  • Figure 2: Performance evaluation on predicting peptide-HLA binding specificity. (a) Performance comparison to twelve existing methods on independent (left) and external (right) test dataset, respectively. (b) ROC curves and AUC values achieved by UnifyImmun and eight competing methods on hold-out independent test set. (c) UMAP feature visualization of peptide-HLA pairs. (d) Positive predictive value (PPV) for the top 100, top 1000, and top 5000 predicted pHLA samples.
  • Figure 3: Performance evaluation on predicting peptide-TCR binding specificity. (a-c) Performance comparison to four methods on independent, external, and COVID-19 test sets, respectively. (d-f) Positive predictive value (PPV) for the top 100, top 1000, and top 5000 predicted samples on independent, external, and COVID-19 test sets, respectively. (g-h) ROC curves and AUC values on independent and external test dataset, respectively.
  • Figure 4: Two-phase progressive training improved performance for both pHLA and pTCR binding prediction tasks. (a-b) AUROC and AUPR values increased with two-phase training rounds on pHLA independent test set. (c-d) AUROC and AUPR values increased with two-phase training rounds on the pTCR independent test set.
  • Figure 5: Heatmaps generated from cross-attention scores and integrated gradients. (a-b) Heatmaps of cross-attention scores and integrated gradients of the amino-acid type at each position of 9-mer peptide binding to HLA molecules. (c,f) Accumulative attention scores across peptide length of each amino-acid type of peptide binding to HLA and TCR molecules, respectively. (d-e) Heatmaps of cross-attention scores and integrated gradients of the amino-acid type at each position of 9-mer peptide binding to TCR molecules. (g) Heatmaps of cross-attention scores for top five HLA alleles with most 9-mer binding peptides. (h-i) Attention score-based heatmap and 3D structure for TCR complex with HLA-B35:01/HPVG (PDB ID: 3MV7).
  • ...and 1 more figures