Table of Contents
Fetching ...

Classification of autoimmune diseases from Peripheral blood TCR repertoires by multimodal multi-instance learning

Ruihao Zhang, Mao chen, Fei Ye, Dandan Meng, Yixuan Huang, Xiao Liu

TL;DR

This work addresses autoimmune disease diagnosis from peripheral blood TCR repertoires by introducing EAMIL, a multimodal multimodal MIL framework that combines PrimeSeq-based sequence selection with ESMonehot encoding and an enhanced gate attention mechanism. By dual training at sample and instance levels with pseudo-labeling, EAMIL achieves state-of-the-art accuracy for SLE and RA, including AUCs of $98.95\%$ for SLE and $97.76\%$ for RA, and can stratify disease activity (SLEDAI) and organ-damage status. The model also identifies disease-associated genes with high concordance to known analyses and demonstrates robustness to age and sex confounders. These results establish EAMIL as a powerful, interpretable tool for immune-receptor-based diagnostics with potential clinical impact across immune-mediated diseases.

Abstract

T cell receptor (TCR) repertoires encode critical immunological signatures for autoimmune diseases, yet their clinical application remains limited by sequence sparsity and low witness rates. We developed EAMil, a multi-instance deep learning framework that leverages TCR sequencing data to diagnose systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) with exceptional accuracy. By integrating PrimeSeq feature extraction with ESMonehot encoding and enhanced gate attention mechanisms, our model achieved state-of-the-art performance with AUCs of 98.95% for SLE and 97.76% for RA. EAMil successfully identified disease-associated genes with over 90% concordance with established differential analyses and effectively distinguished disease-specific TCR genes. The model demonstrated robustness in classifying multiple disease categories, utilizing the SLEDAI score to stratify SLE patients by disease severity as well as to diagnose the site of damage in SLE patients, and effectively controlling for confounding factors such as age and gender. This interpretable framework for immune receptor analysis provides new insights for autoimmune disease detection and classification with broad potential clinical applications across immune-mediated conditions.

Classification of autoimmune diseases from Peripheral blood TCR repertoires by multimodal multi-instance learning

TL;DR

This work addresses autoimmune disease diagnosis from peripheral blood TCR repertoires by introducing EAMIL, a multimodal multimodal MIL framework that combines PrimeSeq-based sequence selection with ESMonehot encoding and an enhanced gate attention mechanism. By dual training at sample and instance levels with pseudo-labeling, EAMIL achieves state-of-the-art accuracy for SLE and RA, including AUCs of for SLE and for RA, and can stratify disease activity (SLEDAI) and organ-damage status. The model also identifies disease-associated genes with high concordance to known analyses and demonstrates robustness to age and sex confounders. These results establish EAMIL as a powerful, interpretable tool for immune-receptor-based diagnostics with potential clinical impact across immune-mediated diseases.

Abstract

T cell receptor (TCR) repertoires encode critical immunological signatures for autoimmune diseases, yet their clinical application remains limited by sequence sparsity and low witness rates. We developed EAMil, a multi-instance deep learning framework that leverages TCR sequencing data to diagnose systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) with exceptional accuracy. By integrating PrimeSeq feature extraction with ESMonehot encoding and enhanced gate attention mechanisms, our model achieved state-of-the-art performance with AUCs of 98.95% for SLE and 97.76% for RA. EAMil successfully identified disease-associated genes with over 90% concordance with established differential analyses and effectively distinguished disease-specific TCR genes. The model demonstrated robustness in classifying multiple disease categories, utilizing the SLEDAI score to stratify SLE patients by disease severity as well as to diagnose the site of damage in SLE patients, and effectively controlling for confounding factors such as age and gender. This interpretable framework for immune receptor analysis provides new insights for autoimmune disease detection and classification with broad potential clinical applications across immune-mediated conditions.

Paper Structure

This paper contains 22 sections, 5 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Framework of the EAMIL model. (A) Feature extraction: ESMonehot encodes PrimeSeq-selected key sequences into high-dimensional vectors. (B) Multi-instance learning: Enhanced gate attention mechanism with pooling and clustering strategy uses bag-level labels as pseudo-labels to optimize sequence and sample-level feature learning. (C) Dual attention: Gated and spatial attention mechanisms enable multi-level modeling of global and local features. (D) Attention-based labeling: Top-k sequences receive pseudo-labels based on attention scores from the enhanced gate attention mechanism.
  • Figure 2: Comparison of results with existing deep learning methods. The figure illustrates the comparative performance of our model against DeepTCR (in small, medium, and large settings) and DeepTAPE on two disease cases using the AUC metric.
  • Figure 3: Visual analysis of significant genes and features. To identify significant genes in SLE and rheumatoid arthritis patients, the top 10 sequences cumulatively identified by our model were counted and visualized as bubble plots. Each bubble represents a gene, with higher scores indicating greater significance. Genes that align with findings from previous studies are highlighted with red stars (A, B). Furthermore, T-SNE plots display the distribution of SLE-encoded VCDR3 features (C) and RA-encoded VCDR3 features (D) extracted from the attention module for SLE controls and RA controls, respectively.
  • Figure 4: Results of analysis experiments. (A) One-vs-others experiment results, illustrating the classification performance for SLE, RA, and Control groups. (B) Diagnostic analysis identifying damaged body parts in SLE patients, with affected areas labeled. (C) Sex-based analysis of SLE patients, highlighting differences across genders. (D) Age group-based analysis of SLE patients, showcasing variations across different age groups. Radar charts show five-fold cross-validation results, adjusted by subtracting 50%. (E) Identification of Active SLE patients compared to healthy samples. (F) Identification of Silent SLE patients compared to healthy samples.