Table of Contents
Fetching ...

Explainable AI model reveals disease-related mechanisms in single-cell RNA-seq data

Mohammad Usman, Olga Varea, Petia Radeva, Josep Canals, Jordi Abante, Daniel Ortiz

TL;DR

This study addresses the challenge of interpreting neurodegenerative disease mechanisms from single-cell data by integrating a neural-network classifier with SHAP-based explainable AI to identify HD-associated genes at single-cell resolution. The approach compares SHAP-informed gene importance with traditional DESeq2 differential expression, followed by GSEA to reveal affected pathways in direct- and indirect-pathway SPNs. Results show both overlap and divergence between the methods, with SHAP uncovering additional HD-relevant genes and pathways not captured by DGE alone, thereby offering a broader mechanistic view. The framework demonstrates the value of XAI for extracting actionable, cell-type–specific insights from single-cell transcriptomics and can be extended to multi-omics and other diseases.

Abstract

Neurodegenerative diseases (NDDs) are complex and lack effective treatment due to their poorly understood mechanism. The increasingly used data analysis from Single nucleus RNA Sequencing (snRNA-seq) allows to explore transcriptomic events at a single cell level, yet face challenges in interpreting the mechanisms underlying a disease. On the other hand, Neural Network (NN) models can handle complex data to offer insights but can be seen as black boxes with poor interpretability. In this context, explainable AI (XAI) emerges as a solution that could help to understand disease-associated mechanisms when combined with efficient NN models. However, limited research explores XAI in single-cell data. In this work, we implement a method for identifying disease-related genes and the mechanistic explanation of disease progression based on NN model combined with SHAP. We analyze available Huntington's disease (HD) data to identify both HD-altered genes and mechanisms by adding Gene Set Enrichment Analysis (GSEA) comparing two methods, differential gene expression analysis (DGE) and NN combined with SHAP approach. Our results show that DGE and SHAP approaches offer both common and differential sets of altered genes and pathways, reinforcing the usefulness of XAI methods for a broader perspective of disease.

Explainable AI model reveals disease-related mechanisms in single-cell RNA-seq data

TL;DR

This study addresses the challenge of interpreting neurodegenerative disease mechanisms from single-cell data by integrating a neural-network classifier with SHAP-based explainable AI to identify HD-associated genes at single-cell resolution. The approach compares SHAP-informed gene importance with traditional DESeq2 differential expression, followed by GSEA to reveal affected pathways in direct- and indirect-pathway SPNs. Results show both overlap and divergence between the methods, with SHAP uncovering additional HD-relevant genes and pathways not captured by DGE alone, thereby offering a broader mechanistic view. The framework demonstrates the value of XAI for extracting actionable, cell-type–specific insights from single-cell transcriptomics and can be extended to multi-omics and other diseases.

Abstract

Neurodegenerative diseases (NDDs) are complex and lack effective treatment due to their poorly understood mechanism. The increasingly used data analysis from Single nucleus RNA Sequencing (snRNA-seq) allows to explore transcriptomic events at a single cell level, yet face challenges in interpreting the mechanisms underlying a disease. On the other hand, Neural Network (NN) models can handle complex data to offer insights but can be seen as black boxes with poor interpretability. In this context, explainable AI (XAI) emerges as a solution that could help to understand disease-associated mechanisms when combined with efficient NN models. However, limited research explores XAI in single-cell data. In this work, we implement a method for identifying disease-related genes and the mechanistic explanation of disease progression based on NN model combined with SHAP. We analyze available Huntington's disease (HD) data to identify both HD-altered genes and mechanisms by adding Gene Set Enrichment Analysis (GSEA) comparing two methods, differential gene expression analysis (DGE) and NN combined with SHAP approach. Our results show that DGE and SHAP approaches offer both common and differential sets of altered genes and pathways, reinforcing the usefulness of XAI methods for a broader perspective of disease.
Paper Structure (13 sections, 1 equation, 5 figures, 2 tables)

This paper contains 13 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Single-nuclei RNA-seq mouse data. a. Integration of single-nuclei RNA-seq data with colors representing the cell-type identified using cluster markers. Here, we focus on the spiny projection neurons (SPNs), the type of neuron that is primarily affected by Huntington's disease. b. Cell count distribution split according to condition
  • Figure 2: Model diagram of single cell analysis shows samples from R6/2 Huntington (HD) mice model and Non-transgenic (NT) which is Wild-type (WT) mice were collected at two different developmental stages 8 weeks old and 12 weeks old of the brain followed by a Single nucleus RNA Sequencing (a) is done to generate a cell atlas (b) for both conditions WT and R6/2. Subsequently, we perform differential expression analysis (c). A NN model (d) is trained on these 2 conditions combined with explainable AI (e) to identify potentially altered genes to understand disease mechanisms
  • Figure 3: Barplot displaying top 20 DEGs from DESEq2 based on absolute LFC for clusters iSPN (left) and dSPN (right). Bars are colour-coded to indicate HD upregulated genes (blue) and down-regulated (red).
  • Figure 4: SHAP summary plot showing gene importance ordered by mean absolute SHAP values for HD cells in clusters iSPN (left) and dSPN (right).
  • Figure 5: Venn diagram illustrating the overlap between differentially expressed genes identified by DESeq2 and informative genes identified by SHAP values. The diagrams show the intersection of these gene sets when applying thresholds on the mean absolute SHAP values that correspond to it's quartiles for (a) cluster iSPN, with SHAP value $>$ first quartile and SHAP value $>$ second quartile, and (b) cluster dSPN, with SHAP value $>$ first quartile and SHAP value $>$ second quartile.