Table of Contents
Fetching ...

eDOC: Explainable Decoding Out-of-domain Cell Types with Evidential Learning

Chaochen Wu, Meiyun Zuo, Lei Xie

TL;DR

EDOC leverages a transformer architecture with evidential learning to annotate In-Domain and OOD cell types as well as to highlight genes that contribute both IND cells and OOD cells in a single cell resolution, suggesting that eDOC may provide new insights into single-cell biology.

Abstract

Single-cell RNA-seq (scRNA-seq) technology is a powerful tool for unraveling the complexity of biological systems. One of essential and fundamental tasks in scRNA-seq data analysis is Cell Type Annotation (CTA). In spite of tremendous efforts in developing machine learning methods for this problem, several challenges remains. They include identifying Out-of-Domain (OOD) cell types, quantifying the uncertainty of unseen cell type annotations, and determining interpretable cell type-specific gene drivers for an OOD case. OOD cell types are often associated with therapeutic responses and disease origins, making them critical for precision medicine and early disease diagnosis. Additionally, scRNA-seq data contains tens thousands of gene expressions. Pinpointing gene drivers underlying CTA can provide deep insight into gene regulatory mechanisms and serve as disease biomarkers. In this study, we develop a new method, eDOC, to address aforementioned challenges. eDOC leverages a transformer architecture with evidential learning to annotate In-Domain (IND) and OOD cell types as well as to highlight genes that contribute both IND cells and OOD cells in a single cell resolution. Rigorous experiments demonstrate that eDOC significantly improves the efficiency and effectiveness of OOD cell type and gene driver identification compared to other state-of-the-art methods. Our findings suggest that eDOC may provide new insights into single-cell biology.

eDOC: Explainable Decoding Out-of-domain Cell Types with Evidential Learning

TL;DR

EDOC leverages a transformer architecture with evidential learning to annotate In-Domain and OOD cell types as well as to highlight genes that contribute both IND cells and OOD cells in a single cell resolution, suggesting that eDOC may provide new insights into single-cell biology.

Abstract

Single-cell RNA-seq (scRNA-seq) technology is a powerful tool for unraveling the complexity of biological systems. One of essential and fundamental tasks in scRNA-seq data analysis is Cell Type Annotation (CTA). In spite of tremendous efforts in developing machine learning methods for this problem, several challenges remains. They include identifying Out-of-Domain (OOD) cell types, quantifying the uncertainty of unseen cell type annotations, and determining interpretable cell type-specific gene drivers for an OOD case. OOD cell types are often associated with therapeutic responses and disease origins, making them critical for precision medicine and early disease diagnosis. Additionally, scRNA-seq data contains tens thousands of gene expressions. Pinpointing gene drivers underlying CTA can provide deep insight into gene regulatory mechanisms and serve as disease biomarkers. In this study, we develop a new method, eDOC, to address aforementioned challenges. eDOC leverages a transformer architecture with evidential learning to annotate In-Domain (IND) and OOD cell types as well as to highlight genes that contribute both IND cells and OOD cells in a single cell resolution. Rigorous experiments demonstrate that eDOC significantly improves the efficiency and effectiveness of OOD cell type and gene driver identification compared to other state-of-the-art methods. Our findings suggest that eDOC may provide new insights into single-cell biology.

Paper Structure

This paper contains 19 sections, 13 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: eDOC architecture and illustration of evidential learning to find OOD cells and detect marker genes in OOD cells. If Gene 1 causes Uncertainty drop significantly, so Gene 1 is one of marker genes for IND cell types. Adding Gene 3 causes the increase of uncertainty so Gene 3 is one of marker genes for OOD cells. Whether a cell is OOD or IND cell depending the last position in the transformer when all genes are visible to the model.
  • Figure 2: Two examples of uncertainty score changes. In each plot, we randomly select 5 OOD cells and 5 IND cells to draw uncertainty score $u_n$ changes with $n$ genes. Each line represent a cell.
  • Figure 3: Illustration of genes that are highlighted by eDOC for IND (left) and OOD (right) experiments.
  • Figure 4: Examples of identified marker genes by eDOC (top) and attention weights (bottom) for three different IND cells. Each heatmap panel represents a cell with five marker genes for the cell. Heatmaps in the top row show eDOC's $u_n^{IND}$ with 40 runs, and the number in each heatmap panel represents the $n$th position. Heatmaps in the bottom row show attention weights for all heads (8) of the Transformer model.
  • Figure 5: Examples of identified marker genes with eDOC for three different OOD cells. Each heatmap panel represents a cell with five marker genes for the cell. The heapmap shows eDOC's $u_n^{OOD}$ with 20 runs, and the number in each heatmap's cell represents the $n$th position.