Table of Contents
Fetching ...

TCR-EML: Explainable Model Layers for TCR-pMHC Prediction

Jiarui Li, Zixiang Yin, Zhengming Ding, Samuel J. Landry, Ramgopal R. Mettu

TL;DR

TCR-EML introduces explainable model layers for TCR-pMHC binding by embedding a Feature Enhancement and Fusion (FEF) block and residue-level contact prototype layers into pretrained protein language model backbones. The FEf module fuses CDRα, CDRβ, and peptide representations via cross-attention to produce enriched features, while the contact prototypes quantify residue contacts with a differentiable, thresholded similarity mechanism, yielding an interpretable binding score $oxed{\,\hat{y}\,} = (w_{a,e} + w_{b,e})/2$. Evaluations on a large, diverse TCR-pMHC dataset and the TCR-XAI benchmark show competitive predictive accuracy and improved explainability (BRHR > 0.71–0.81 on key directions), with case studies validating alignment between predicted contacts and experimental structures. Importantly, TCR-EML can be paired with PLMs without full retraining, enabling faithful, mechanism-guided explanations that can aid vaccine design, immunotherapy, and autoimmune research.

Abstract

T cell receptor (TCR) recognition of peptide-MHC (pMHC) complexes is a central component of adaptive immunity, with implications for vaccine design, cancer immunotherapy, and autoimmune disease. While recent advances in machine learning have improved prediction of TCR-pMHC binding, the most effective approaches are black-box transformer models that cannot provide a rationale for predictions. Post-hoc explanation methods can provide insight with respect to the input but do not explicitly model biochemical mechanisms (e.g. known binding regions), as in TCR-pMHC binding. ``Explain-by-design'' models (i.e., with architectural components that can be examined directly after training) have been explored in other domains, but have not been used for TCR-pMHC binding. We propose explainable model layers (TCR-EML) that can be incorporated into protein-language model backbones for TCR-pMHC modeling. Our approach uses prototype layers for amino acid residue contacts drawn from known TCR-pMHC binding mechanisms, enabling high-quality explanations for predicted TCR-pMHC binding. Experiments of our proposed method on large-scale datasets demonstrate competitive predictive accuracy and generalization, and evaluation on the TCR-XAI benchmark demonstrates improved explainability compared with existing approaches.

TCR-EML: Explainable Model Layers for TCR-pMHC Prediction

TL;DR

TCR-EML introduces explainable model layers for TCR-pMHC binding by embedding a Feature Enhancement and Fusion (FEF) block and residue-level contact prototype layers into pretrained protein language model backbones. The FEf module fuses CDRα, CDRβ, and peptide representations via cross-attention to produce enriched features, while the contact prototypes quantify residue contacts with a differentiable, thresholded similarity mechanism, yielding an interpretable binding score . Evaluations on a large, diverse TCR-pMHC dataset and the TCR-XAI benchmark show competitive predictive accuracy and improved explainability (BRHR > 0.71–0.81 on key directions), with case studies validating alignment between predicted contacts and experimental structures. Importantly, TCR-EML can be paired with PLMs without full retraining, enabling faithful, mechanism-guided explanations that can aid vaccine design, immunotherapy, and autoimmune research.

Abstract

T cell receptor (TCR) recognition of peptide-MHC (pMHC) complexes is a central component of adaptive immunity, with implications for vaccine design, cancer immunotherapy, and autoimmune disease. While recent advances in machine learning have improved prediction of TCR-pMHC binding, the most effective approaches are black-box transformer models that cannot provide a rationale for predictions. Post-hoc explanation methods can provide insight with respect to the input but do not explicitly model biochemical mechanisms (e.g. known binding regions), as in TCR-pMHC binding. ``Explain-by-design'' models (i.e., with architectural components that can be examined directly after training) have been explored in other domains, but have not been used for TCR-pMHC binding. We propose explainable model layers (TCR-EML) that can be incorporated into protein-language model backbones for TCR-pMHC modeling. Our approach uses prototype layers for amino acid residue contacts drawn from known TCR-pMHC binding mechanisms, enabling high-quality explanations for predicted TCR-pMHC binding. Experiments of our proposed method on large-scale datasets demonstrate competitive predictive accuracy and generalization, and evaluation on the TCR-XAI benchmark demonstrates improved explainability compared with existing approaches.

Paper Structure

This paper contains 20 sections, 9 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The explainable model layers include a Feature Enhancement and Fusion (FEF) block and contact prototype layers, which not only predict TCR-pMHC binding but also provide contact scores corresponding to contact distances. In the absence of experimental TCR-pMHC structures, the contact prototype illuminates TCR-pMHC binding patterns.
  • Figure 2: Overview of the our explainable model layers for TCR-pMHC binding prediction. The Feature Enhancement and Fusion (FEF) block integrates information between TCR chains and TCR-peptide pairs. Contact prototype layers model residue-level contact areas and distances between CDR3 regions and the peptide.
  • Figure 3: ROC-AUC with maximum false positive rate of 0.1 on the top-150 peptides in the test set. Results are reported for all PLM backbones (ESM-1b rives2021biological, ESM-2 lin2023evolutionary, and ProteinBERT brandes2022proteinbert) with either a linear classifier or our method, and compared against MixTCRpred croce2024deep and TULIP meynard2024tulip as comparable models.
  • Figure 4: Predicted versus experimental peptide-CDR3 contact distances for HLA-DR4-bound vimentin-64cit59-71 (PDB: 8TRR) loh2024molecular. TCR-EML predictions closely match experimental contacts, highlighting the model explainability.
  • Figure 5: Average contact scores from ProteinBERT contact prototype layers. Positive samples show proximal contacts, whereas negative samples exhibit distal contacts.
  • ...and 1 more figures