Single-cell Curriculum Learning-based Deep Graph Embedding Clustering

Huifa Li; Jie Fu; Xinpeng Ling; Zhiyu Sun; Kuncan Wang; Zhili Chen

Single-cell Curriculum Learning-based Deep Graph Embedding Clustering

Huifa Li, Jie Fu, Xinpeng Ling, Zhiyu Sun, Kuncan Wang, Zhili Chen

TL;DR

A single-cell curriculum learning-based deep graph embedding clustering (scCLG) that combines three optimization objectives, including topology reconstruction loss of cell graphs, zero-inflated negative binomial (ZINB) loss, and clustering loss, to learn cell-cell topology representation is proposed.

Abstract

The swift advancement of single-cell RNA sequencing (scRNA-seq) technologies enables the investigation of cellular-level tissue heterogeneity. Cell annotation significantly contributes to the extensive downstream analysis of scRNA-seq data. However, The analysis of scRNA-seq for biological inference presents challenges owing to its intricate and indeterminate data distribution, characterized by a substantial volume and a high frequency of dropout events. Furthermore, the quality of training samples varies greatly, and the performance of the popular scRNA-seq data clustering solution GNN could be harmed by two types of low-quality training nodes: 1) nodes on the boundary; 2) nodes that contribute little additional information to the graph. To address these problems, we propose a single-cell curriculum learning-based deep graph embedding clustering (scCLG). We first propose a Chebyshev graph convolutional autoencoder with multi-criteria (ChebAE) that combines three optimization objectives, including topology reconstruction loss of cell graphs, zero-inflated negative binomial (ZINB) loss, and clustering loss, to learn cell-cell topology representation. Meanwhile, we employ a selective training strategy to train GNN based on the features and entropy of nodes and prune the difficult nodes based on the difficulty scores to keep the high-quality graph. Empirical results on a variety of gene expression datasets show that our model outperforms state-of-the-art methods. The code of scCLG will be made publicly available at https://github.com/LFD-byte/scCLG.

Single-cell Curriculum Learning-based Deep Graph Embedding Clustering

TL;DR

Abstract

Paper Structure (23 sections, 16 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 23 sections, 16 equations, 4 figures, 3 tables, 1 algorithm.

Introduction
Related Work
PRELIMINARIES
Notations
Chebyshev Graph Convolution
Proposed Approach
Multi-Criteria ChebConv Graph Autoencoder
Reconstruction Loss
ZINB Loss
Clustering Loss
Curriculum Learning with Data Pruning
Hierarchical Difficulty Measurer
Data Pruning
The Proposed scCLG Algorithm
Experiments
...and 8 more sections

Figures (4)

Figure 1: Framework of scCLG. (A) Pre-training: pretraining the proposed ChebAE with adjacency matrix decoder and ZINB decoder. Then calculate node difficulty using a hierarchical difficulty measurer and prune the data. (B) Formal training: using all three criterias to optimize the model in more detail from easy to hard pattern with pruned data.
Figure 2: The model architecture of multi-criteria ChebAE. ChebAE integrates three loss components: reconstruction loss, ZINB loss, and a clustering loss to optimize the low-dimensional latent representation.
Figure 3: Parameter analysis. (A) Comparison of the average ARI and NMI values with different neighbor parameters $k$. (B) Comparison of the average ARI and NMI values with different numbers of genes.
Figure 4: Comparison of the average ARI and NMI values with different data pruning rates and pruning strategies.

Single-cell Curriculum Learning-based Deep Graph Embedding Clustering

TL;DR

Abstract

Single-cell Curriculum Learning-based Deep Graph Embedding Clustering

Authors

TL;DR

Abstract

Table of Contents

Figures (4)