Gene Regulatory Network Inference from Pre-trained Single-Cell Transcriptomics Transformer with Joint Graph Learning
Sindhura Kommu, Yizhi Wang, Yue Wang, Xuan Wang
TL;DR
This work tackles gene regulatory network inference from single-cell RNA-seq data by introducing scTransNet, a joint framework that fuses pre-trained single-cell transformer representations with structured GRN knowledge via a graph neural network. The model combines a scBERT-based encoding layer, attentive pooling across cells, a GNN over a prior GRN, and a final output layer that merges these representations to predict gene–gene regulatory interactions. Empirical results on BEELINE datasets show consistent improvements over a wide range of baselines in AUROC and AUPRC, and ablation studies demonstrate the critical contributions of the GNN encoder, scBERT, and attentive pooling. Overall, the approach advances GRN inference by leveraging both rich contextual gene representations and explicit network structure, with potential implications for understanding cellular regulatory mechanisms and improving interpretability in single-cell analyses.
Abstract
Inferring gene regulatory networks (GRNs) from single-cell RNA sequencing (scRNA-seq) data is a complex challenge that requires capturing the intricate relationships between genes and their regulatory interactions. In this study, we tackle this challenge by leveraging the single-cell BERT-based pre-trained transformer model (scBERT), trained on extensive unlabeled scRNA-seq data, to augment structured biological knowledge from existing GRNs. We introduce a novel joint graph learning approach that combines the rich contextual representations learned by pre-trained single-cell language models with the structured knowledge encoded in GRNs using graph neural networks (GNNs). By integrating these two modalities, our approach effectively reasons over boththe gene expression level constraints provided by the scRNA-seq data and the structured biological knowledge inherent in GRNs. We evaluate our method on human cell benchmark datasets from the BEELINE study with cell type-specific ground truth networks. The results demonstrate superior performance over current state-of-the-art baselines, offering a deeper understanding of cellular regulatory mechanisms.
