Table of Contents
Fetching ...

Heterogeneous gene network estimation for single-cell transcriptomic data via a joint regularized deep neural network

Jingyuan Yang, Tao Li, Tianyi Wang, Shuangge Ma, Mengyun Wu

TL;DR

Single-cell gene network estimation faces cellular heterogeneity and dropout (zero inflation). The authors propose JRDNN-KM—a joint regularized deep neural network with Mahalanobis-distance-based K-means—to simultaneously estimate networks for $K$ subgroups and identify subgroup labels. The framework uses a zero-inflated conditional Gaussian model with nonlinear gene-gene relationships via a neural-network function, includes both homogeneous and subgroup-specific hidden neurons, and imposes sparsity and cross-subgroup similarity penalties. Applied to five real datasets and application-based simulations, it achieves superior subgroup identification and edge recovery, reveals hub genes and GO-enriched modules, and demonstrates robustness and interpretability. The approach advances heterogeneous network estimation for single-cell data and provides accessible code for broader use.

Abstract

Estimation of intracellular gene networks has been a critical component of single-cell transcriptomic data analysis, which can provide crucial insights into the complex interplay between genes, facilitating the discovery of the biological basis of human life at single-cell resolution. Despite notable achievements, existing methodologies often falter in their practicality, primarily due to their narrow focus on simplistic linear relationships and inadequate handling of cellular heterogeneity. To bridge these gaps, we propose a joint regularized deep neural network method incorporating Mahalanobis distance-based K-means clustering (JRDNN-KM) to estimate multiple networks for various cell subgroups simultaneously, accounting for both unknown cellular heterogeneity and zero inflation, and, more importantly, complex nonlinear relationships among genes. We introduce an innovative selection layer for network construction, along with hidden layers that include both shared and subgroup-specific neurons, to capture common patterns and subgroup-specific variations across networks. Applied to real single-cell transcriptomic data from multiple tissues and species, JRDNN-KM demonstrates higher accuracy and biological interpretability in network estimation, and more accurately identifies cell subgroups compared to current state-of-the-art methods.Building on network construction, we further find hub genes with important biological implications and modules with statistical enrichment of biological processes.

Heterogeneous gene network estimation for single-cell transcriptomic data via a joint regularized deep neural network

TL;DR

Single-cell gene network estimation faces cellular heterogeneity and dropout (zero inflation). The authors propose JRDNN-KM—a joint regularized deep neural network with Mahalanobis-distance-based K-means—to simultaneously estimate networks for subgroups and identify subgroup labels. The framework uses a zero-inflated conditional Gaussian model with nonlinear gene-gene relationships via a neural-network function, includes both homogeneous and subgroup-specific hidden neurons, and imposes sparsity and cross-subgroup similarity penalties. Applied to five real datasets and application-based simulations, it achieves superior subgroup identification and edge recovery, reveals hub genes and GO-enriched modules, and demonstrates robustness and interpretability. The approach advances heterogeneous network estimation for single-cell data and provides accessible code for broader use.

Abstract

Estimation of intracellular gene networks has been a critical component of single-cell transcriptomic data analysis, which can provide crucial insights into the complex interplay between genes, facilitating the discovery of the biological basis of human life at single-cell resolution. Despite notable achievements, existing methodologies often falter in their practicality, primarily due to their narrow focus on simplistic linear relationships and inadequate handling of cellular heterogeneity. To bridge these gaps, we propose a joint regularized deep neural network method incorporating Mahalanobis distance-based K-means clustering (JRDNN-KM) to estimate multiple networks for various cell subgroups simultaneously, accounting for both unknown cellular heterogeneity and zero inflation, and, more importantly, complex nonlinear relationships among genes. We introduce an innovative selection layer for network construction, along with hidden layers that include both shared and subgroup-specific neurons, to capture common patterns and subgroup-specific variations across networks. Applied to real single-cell transcriptomic data from multiple tissues and species, JRDNN-KM demonstrates higher accuracy and biological interpretability in network estimation, and more accurately identifies cell subgroups compared to current state-of-the-art methods.Building on network construction, we further find hub genes with important biological implications and modules with statistical enrichment of biological processes.

Paper Structure

This paper contains 15 sections, 3 equations, 5 figures.

Figures (5)

  • Figure 1: Workflow of JRDNN-KM. (A) Input: normalized single-cell transcriptome data $\boldsymbol{x}_i$'s and initialized cell subgroup memberships $C_i$'s. (B) JRDNN-KM: iterations between JRDNN and Mahalanobis distance-based K-means clustering. (C) Output: $K$ networks for the $K$ subgroups, with both common and specific edges, and estimated cell subgroup memberships. (D) JRDNN architecture: The architecture comprises a selection layer and hidden layers with combined homogeneous and heterogeneous neurons, and is optimized by a zero-inflated loss function that includes sparse and similarity regularization terms.
  • Figure 2: Heterogeneity analysis results on five real single-cell transcriptomic datasets. (A) ARI and NMI values with different methods. (B) Proportions of identified cell subgroups with JRDNN-KM in the true cell subgroups.
  • Figure 3: F1 scores of different methods evaluated against four reference networks across five datasets. Shapes represent methods, colors indicate reference networks, and horizontal lines show mean values across the four networks.
  • Figure 4: Networks constructed with JRDNN-KM for subgroup 1 (A), subgroup 2 (C), and subgroup 3 (E) of the LUAD dataset, where the common connections across all subgroups are highlighted by thick edges with involved genes highlighted with red color, and the hub genes with the top ten largest degrees are highlighted with a bigger point. (B), (D), and (F): Two representative communities detected using the Louvain algorithm for subgroup 1, subgroup 2, and subgroup 3, respectively.
  • Figure 5: Comparative performance of different methods on three application-based simulation datasets; (A)-(D): ARI, F1, Recall, and Precision values.