TCR-EML: Explainable Model Layers for TCR-pMHC Prediction
Jiarui Li, Zixiang Yin, Zhengming Ding, Samuel J. Landry, Ramgopal R. Mettu
TL;DR
TCR-EML introduces explainable model layers for TCR-pMHC binding by embedding a Feature Enhancement and Fusion (FEF) block and residue-level contact prototype layers into pretrained protein language model backbones. The FEf module fuses CDRα, CDRβ, and peptide representations via cross-attention to produce enriched features, while the contact prototypes quantify residue contacts with a differentiable, thresholded similarity mechanism, yielding an interpretable binding score $oxed{\,\hat{y}\,} = (w_{a,e} + w_{b,e})/2$. Evaluations on a large, diverse TCR-pMHC dataset and the TCR-XAI benchmark show competitive predictive accuracy and improved explainability (BRHR > 0.71–0.81 on key directions), with case studies validating alignment between predicted contacts and experimental structures. Importantly, TCR-EML can be paired with PLMs without full retraining, enabling faithful, mechanism-guided explanations that can aid vaccine design, immunotherapy, and autoimmune research.
Abstract
T cell receptor (TCR) recognition of peptide-MHC (pMHC) complexes is a central component of adaptive immunity, with implications for vaccine design, cancer immunotherapy, and autoimmune disease. While recent advances in machine learning have improved prediction of TCR-pMHC binding, the most effective approaches are black-box transformer models that cannot provide a rationale for predictions. Post-hoc explanation methods can provide insight with respect to the input but do not explicitly model biochemical mechanisms (e.g. known binding regions), as in TCR-pMHC binding. ``Explain-by-design'' models (i.e., with architectural components that can be examined directly after training) have been explored in other domains, but have not been used for TCR-pMHC binding. We propose explainable model layers (TCR-EML) that can be incorporated into protein-language model backbones for TCR-pMHC modeling. Our approach uses prototype layers for amino acid residue contacts drawn from known TCR-pMHC binding mechanisms, enabling high-quality explanations for predicted TCR-pMHC binding. Experiments of our proposed method on large-scale datasets demonstrate competitive predictive accuracy and generalization, and evaluation on the TCR-XAI benchmark demonstrates improved explainability compared with existing approaches.
