Classification of autoimmune diseases from Peripheral blood TCR repertoires by multimodal multi-instance learning
Ruihao Zhang, Mao chen, Fei Ye, Dandan Meng, Yixuan Huang, Xiao Liu
TL;DR
This work addresses autoimmune disease diagnosis from peripheral blood TCR repertoires by introducing EAMIL, a multimodal multimodal MIL framework that combines PrimeSeq-based sequence selection with ESMonehot encoding and an enhanced gate attention mechanism. By dual training at sample and instance levels with pseudo-labeling, EAMIL achieves state-of-the-art accuracy for SLE and RA, including AUCs of $98.95\%$ for SLE and $97.76\%$ for RA, and can stratify disease activity (SLEDAI) and organ-damage status. The model also identifies disease-associated genes with high concordance to known analyses and demonstrates robustness to age and sex confounders. These results establish EAMIL as a powerful, interpretable tool for immune-receptor-based diagnostics with potential clinical impact across immune-mediated diseases.
Abstract
T cell receptor (TCR) repertoires encode critical immunological signatures for autoimmune diseases, yet their clinical application remains limited by sequence sparsity and low witness rates. We developed EAMil, a multi-instance deep learning framework that leverages TCR sequencing data to diagnose systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) with exceptional accuracy. By integrating PrimeSeq feature extraction with ESMonehot encoding and enhanced gate attention mechanisms, our model achieved state-of-the-art performance with AUCs of 98.95% for SLE and 97.76% for RA. EAMil successfully identified disease-associated genes with over 90% concordance with established differential analyses and effectively distinguished disease-specific TCR genes. The model demonstrated robustness in classifying multiple disease categories, utilizing the SLEDAI score to stratify SLE patients by disease severity as well as to diagnose the site of damage in SLE patients, and effectively controlling for confounding factors such as age and gender. This interpretable framework for immune receptor analysis provides new insights for autoimmune disease detection and classification with broad potential clinical applications across immune-mediated conditions.
