MATE-Pred: Multimodal Attention-based TCR-Epitope interaction Predictor
Etienne Goffinet, Raghvendra Mall, Ankita Singh, Rahul Kaushik, Filippo Castiglione
TL;DR
The paper tackles predicting TCR-epitope binding affinity, a key step in guiding immunotherapies. It introduces MATE-Pred, a multimodal attention-based predictor that fuses textual AA embeddings with physicochemical descriptors and predicted contact maps for each sequence via early fusion in dual encoders. The approach achieves state-of-the-art performance on large-scale training data and shows robust generalization on a challenging independent test set, with notable gains in MCC and AUC. The work highlights the value of integrating multiple modalities and points to future enhancements with structural data and additional modalities, while providing code and datasets for community use.
Abstract
An accurate binding affinity prediction between T-cell receptors and epitopes contributes decisively to develop successful immunotherapy strategies. Some state-of-the-art computational methods implement deep learning techniques by integrating evolutionary features to convert the amino acid residues of cell receptors and epitope sequences into numerical values, while some other methods employ pre-trained language models to summarize the embedding vectors at the amino acid residue level to obtain sequence-wise representations. Here, we propose a highly reliable novel method, MATE-Pred, that performs multi-modal attention-based prediction of T-cell receptors and epitopes binding affinity. The MATE-Pred is compared and benchmarked with other deep learning models that leverage multi-modal representations of T-cell receptors and epitopes. In the proposed method, the textual representation of proteins is embedded with a pre-trained bi-directional encoder model and combined with two additional modalities: a) a comprehensive set of selected physicochemical properties; b) predicted contact maps that estimate the 3D distances between amino acid residues in the sequences. The MATE-Pred demonstrates the potential of multi-modal model in achieving state-of-the-art performance (+8.4\% MCC, +5.5\% AUC compared to baselines) and efficiently capturing contextual, physicochemical, and structural information from amino acid residues. The performance of MATE-Pred projects its potential application in various drug discovery regimes.
