Gene Regulatory Network Inference from Pre-trained Single-Cell Transcriptomics Transformer with Joint Graph Learning

Sindhura Kommu; Yizhi Wang; Yue Wang; Xuan Wang

Gene Regulatory Network Inference from Pre-trained Single-Cell Transcriptomics Transformer with Joint Graph Learning

Sindhura Kommu, Yizhi Wang, Yue Wang, Xuan Wang

TL;DR

This work tackles gene regulatory network inference from single-cell RNA-seq data by introducing scTransNet, a joint framework that fuses pre-trained single-cell transformer representations with structured GRN knowledge via a graph neural network. The model combines a scBERT-based encoding layer, attentive pooling across cells, a GNN over a prior GRN, and a final output layer that merges these representations to predict gene–gene regulatory interactions. Empirical results on BEELINE datasets show consistent improvements over a wide range of baselines in AUROC and AUPRC, and ablation studies demonstrate the critical contributions of the GNN encoder, scBERT, and attentive pooling. Overall, the approach advances GRN inference by leveraging both rich contextual gene representations and explicit network structure, with potential implications for understanding cellular regulatory mechanisms and improving interpretability in single-cell analyses.

Abstract

Inferring gene regulatory networks (GRNs) from single-cell RNA sequencing (scRNA-seq) data is a complex challenge that requires capturing the intricate relationships between genes and their regulatory interactions. In this study, we tackle this challenge by leveraging the single-cell BERT-based pre-trained transformer model (scBERT), trained on extensive unlabeled scRNA-seq data, to augment structured biological knowledge from existing GRNs. We introduce a novel joint graph learning approach that combines the rich contextual representations learned by pre-trained single-cell language models with the structured knowledge encoded in GRNs using graph neural networks (GNNs). By integrating these two modalities, our approach effectively reasons over boththe gene expression level constraints provided by the scRNA-seq data and the structured biological knowledge inherent in GRNs. We evaluate our method on human cell benchmark datasets from the BEELINE study with cell type-specific ground truth networks. The results demonstrate superior performance over current state-of-the-art baselines, offering a deeper understanding of cellular regulatory mechanisms.

Gene Regulatory Network Inference from Pre-trained Single-Cell Transcriptomics Transformer with Joint Graph Learning

TL;DR

Abstract

Paper Structure (15 sections, 14 equations, 3 figures, 1 table)

This paper contains 15 sections, 14 equations, 3 figures, 1 table.

Introduction
Related Work
Approach
BERT Encoding Layer
Attentive Pooling
GRN encoding with GNNs
Final Output Layer
Experimental Setup
Benchmark scRNA-seq datasets
Implementation and Training details
Baseline Methods
Results
Performance on benchmark datasets
Discussion and Ablations
Conclusion and Future Work

Figures (3)

Figure 1: Overview of scTransNet framework for supervised GRN inference with BERT Encoding Layer (top left; \ref{['module1']}), Attentive Pooling (top right; \ref{['module2']}), GRN encoding with GNNs (bottom left; \ref{['module3']}) and Final Output layer (bottom right; \ref{['module4']}) . It augments the output from graph encoder (for knowledge understanding) with scBERT encoder (for contextual understanding) to infer regulatory interdependencies between genes.
Figure 2: Summary of the GRN prediction performance of scTransNet in the (A) AUROC metric (top) (B) and the AUPRC metric (bottom). Our evaluation is conducted on two human single-cell RNA sequencing (scRNA-seq) datasets, with a cell-type-specific ground-truth network. The scRNA-seq datasets consist of significantly varying transcription factors (TFs) and the 500 or 1000 most-varying genes.
Figure 3: GRN prediction performance of scTransNet on a partial ground truth subgraph. Solid line edges depict ground truth regulatory interactions correctly predicted by scTransNet but missed by the baseline GENELink method, which relies solely on graph representations. Notably, scTransNet effectively identified all regulatory links predicted by GENELink (not visualized). Dotted line edges represent ground truth interactions that scTransNet failed to capture reveal its limitations and providing insights for further improvement. Overall, this highlights scTransNet's strength in leveraging joint learning to uncover additional true regulatory interactions beyond graphs.

Gene Regulatory Network Inference from Pre-trained Single-Cell Transcriptomics Transformer with Joint Graph Learning

TL;DR

Abstract

Gene Regulatory Network Inference from Pre-trained Single-Cell Transcriptomics Transformer with Joint Graph Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)