Heterogeneous gene network estimation for single-cell transcriptomic data via a joint regularized deep neural network
Jingyuan Yang, Tao Li, Tianyi Wang, Shuangge Ma, Mengyun Wu
TL;DR
Single-cell gene network estimation faces cellular heterogeneity and dropout (zero inflation). The authors propose JRDNN-KM—a joint regularized deep neural network with Mahalanobis-distance-based K-means—to simultaneously estimate networks for $K$ subgroups and identify subgroup labels. The framework uses a zero-inflated conditional Gaussian model with nonlinear gene-gene relationships via a neural-network function, includes both homogeneous and subgroup-specific hidden neurons, and imposes sparsity and cross-subgroup similarity penalties. Applied to five real datasets and application-based simulations, it achieves superior subgroup identification and edge recovery, reveals hub genes and GO-enriched modules, and demonstrates robustness and interpretability. The approach advances heterogeneous network estimation for single-cell data and provides accessible code for broader use.
Abstract
Estimation of intracellular gene networks has been a critical component of single-cell transcriptomic data analysis, which can provide crucial insights into the complex interplay between genes, facilitating the discovery of the biological basis of human life at single-cell resolution. Despite notable achievements, existing methodologies often falter in their practicality, primarily due to their narrow focus on simplistic linear relationships and inadequate handling of cellular heterogeneity. To bridge these gaps, we propose a joint regularized deep neural network method incorporating Mahalanobis distance-based K-means clustering (JRDNN-KM) to estimate multiple networks for various cell subgroups simultaneously, accounting for both unknown cellular heterogeneity and zero inflation, and, more importantly, complex nonlinear relationships among genes. We introduce an innovative selection layer for network construction, along with hidden layers that include both shared and subgroup-specific neurons, to capture common patterns and subgroup-specific variations across networks. Applied to real single-cell transcriptomic data from multiple tissues and species, JRDNN-KM demonstrates higher accuracy and biological interpretability in network estimation, and more accurately identifies cell subgroups compared to current state-of-the-art methods.Building on network construction, we further find hub genes with important biological implications and modules with statistical enrichment of biological processes.
