Table of Contents
Fetching ...

scASDC: Attention Enhanced Structural Deep Clustering for Single-cell RNA-seq Data

Wenwen Min, Zhen Wang, Fangfang Zhu, Taosheng Xu, Shunfang Wang

TL;DR

The paper tackles the challenge of clustering sparse and noisy scRNA-seq data by proposing scASDC, a deep clustering framework that jointly learns content information from a ZINB-based autoencoder and high-order cell relationships from a graph autoencoder. These two sources are fused using a layer-wise attention mechanism and reinforced by a self-supervised objective, enabling end-to-end clustering. Across six diverse datasets, scASDC outperforms seven baselines in $NMI$ and $ARI$, with ablations confirming the contribution of each module. The approach yields robust cell-type delineation and supports downstream biological interpretation, advancing accurate analysis of cellular heterogeneity in scRNA-seq data.

Abstract

Single-cell RNA sequencing (scRNA-seq) data analysis is pivotal for understanding cellular heterogeneity. However, the high sparsity and complex noise patterns inherent in scRNA-seq data present significant challenges for traditional clustering methods. To address these issues, we propose a deep clustering method, Attention-Enhanced Structural Deep Embedding Graph Clustering (scASDC), which integrates multiple advanced modules to improve clustering accuracy and robustness.Our approach employs a multi-layer graph convolutional network (GCN) to capture high-order structural relationships between cells, termed as the graph autoencoder module. To mitigate the oversmoothing issue in GCNs, we introduce a ZINB-based autoencoder module that extracts content information from the data and learns latent representations of gene expression. These modules are further integrated through an attention fusion mechanism, ensuring effective combination of gene expression and structural information at each layer of the GCN. Additionally, a self-supervised learning module is incorporated to enhance the robustness of the learned embeddings. Extensive experiments demonstrate that scASDC outperforms existing state-of-the-art methods, providing a robust and effective solution for single-cell clustering tasks. Our method paves the way for more accurate and meaningful analysis of single-cell RNA sequencing data, contributing to better understanding of cellular heterogeneity and biological processes. All code and public datasets used in this paper are available at \url{https://github.com/wenwenmin/scASDC} and \url{https://zenodo.org/records/12814320}.

scASDC: Attention Enhanced Structural Deep Clustering for Single-cell RNA-seq Data

TL;DR

The paper tackles the challenge of clustering sparse and noisy scRNA-seq data by proposing scASDC, a deep clustering framework that jointly learns content information from a ZINB-based autoencoder and high-order cell relationships from a graph autoencoder. These two sources are fused using a layer-wise attention mechanism and reinforced by a self-supervised objective, enabling end-to-end clustering. Across six diverse datasets, scASDC outperforms seven baselines in and , with ablations confirming the contribution of each module. The approach yields robust cell-type delineation and supports downstream biological interpretation, advancing accurate analysis of cellular heterogeneity in scRNA-seq data.

Abstract

Single-cell RNA sequencing (scRNA-seq) data analysis is pivotal for understanding cellular heterogeneity. However, the high sparsity and complex noise patterns inherent in scRNA-seq data present significant challenges for traditional clustering methods. To address these issues, we propose a deep clustering method, Attention-Enhanced Structural Deep Embedding Graph Clustering (scASDC), which integrates multiple advanced modules to improve clustering accuracy and robustness.Our approach employs a multi-layer graph convolutional network (GCN) to capture high-order structural relationships between cells, termed as the graph autoencoder module. To mitigate the oversmoothing issue in GCNs, we introduce a ZINB-based autoencoder module that extracts content information from the data and learns latent representations of gene expression. These modules are further integrated through an attention fusion mechanism, ensuring effective combination of gene expression and structural information at each layer of the GCN. Additionally, a self-supervised learning module is incorporated to enhance the robustness of the learned embeddings. Extensive experiments demonstrate that scASDC outperforms existing state-of-the-art methods, providing a robust and effective solution for single-cell clustering tasks. Our method paves the way for more accurate and meaningful analysis of single-cell RNA sequencing data, contributing to better understanding of cellular heterogeneity and biological processes. All code and public datasets used in this paper are available at \url{https://github.com/wenwenmin/scASDC} and \url{https://zenodo.org/records/12814320}.
Paper Structure (19 sections, 21 equations, 5 figures, 3 tables)

This paper contains 19 sections, 21 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Model framework of scASDC. The scASDC framework is primarily composed of a ZINB-based autoencoder module, a graph autoencoder module, an attention fusion module, and a self-supervised module. The framework leverages the attention mechanism to embed and transmit the outputs of the two autoencoder modules layer by layer. This process ensures that the obtained embedding representation retains both the content and structural information of the original data. Additionally, the self-supervised module integrates multiple networks into a single framework, facilitating end-to-end synchronous updates. In this framework, $\mathbf{\bar{X}}$ represents the input data, and $\mathbf{A}$ denotes the original cell graph. $\mathbf{H}_l$ and $\mathbf{Z}_l$ indicate the outputs of the ZINB-based autoencoder module and the $l$-th layer of the graph autoencoder module, respectively. The parameters $\pi$, $\mu$, and $\theta$ correspond to the three parameters of the ZINB distribution.
  • Figure 2: The analysis of (A) the average NMI and ARI values with different fusion parameter $\alpha$. (B) Comparison of the average NMI and ARI values with different numbers of genes. (C) the average NMI and ARI values with different neighbor parameter $k$.
  • Figure 3: Comparison of UMAP clustering results on six datasets with 2D visualization.
  • Figure 4: (A) depicting the expression distribution of six representative genes significantly expressed in different cell types. Each violin plot illustrates the expression levels of a specific gene across various cell types, highlighting the distinct expression of these genes in individual cell clusters. (B) Heatmap displaying the expression patterns of multiple genes across different cell types. Each row in the heatmap represents a gene, while each column represents a single cell, revealing the expression pattern of specific genes within each cell cluster.
  • Figure 5: Gene function pathway analysis and expression visualization of Cluster1 (T cells) vs Cluster3 (B cells) identitied by our scASDC.