Physicochemically Informed Dual-Conditioned Generative Model of T-Cell Receptor Variable Regions for Cellular Therapy
Jiahao Ma, Hongzong Li, Ye-Fan Hu, Jian-Dong Huang
TL;DR
PhysicoGPTCR tackles the problem of generating TCR variable regions that are novel, diverse, and biophysically plausible within a given peptide–MHC context. It introduces a dual-conditioned Transformer that fuses peptide and HLA inputs with residue-level physicochemical embeddings to model $p_ heta(t \mid m, p)$ in an end-to-end fashion. Across multiple benchmarks against baselines, it achieves superior string-based metrics and shows a higher proportion of docking-competent clones, validated through in-silico analyses and case studies. This approach promises to dramatically shorten the TCR discovery timeline from months to minutes while maintaining downstream verifiability, enabling rapid, personalized cellular therapies.
Abstract
Physicochemically informed biological sequence generation has the potential to accelerate computer-aided cellular therapy, yet current models fail to \emph{jointly} ensure novelty, diversity, and biophysical plausibility when designing variable regions of T-cell receptors (TCRs). We present \textbf{PhysicoGPTCR}, a large generative protein Transformer that is \emph{dual-conditioned} on peptide and HLA context and trained to autoregressively synthesise TCR sequences while embedding residue-level physicochemical descriptors. The model is optimised on curated TCR--peptide--HLA triples with a maximum-likelihood objective and compared against ANN, GPTCR, LSTM, and VAE baselines. Across multiple neoantigen benchmarks, PhysicoGPTCR substantially improves edit-distance, similarity, and longest-common-subsequence scores, while populating a broader region of sequence space. Blind in-silico docking and structural modelling further reveal a higher proportion of binding-competent clones than the strongest baseline, validating the benefit of explicit context conditioning and physicochemical awareness. Experimental results demonstrate that dual-conditioned, physics-grounded generative modelling enables end-to-end design of functional TCR candidates, reducing the discovery timeline from months to minutes without sacrificing wet-lab verifiability.
