Table of Contents
Fetching ...

GRNFormer: A Biologically-Guided Framework for Integrating Gene Regulatory Networks into RNA Foundation Models

Mufan Qiu, Xinyu Hu, Fengwei Zhan, Sukwon Yun, Jie Peng, Ruichen Zhang, Bhavya Kailkhura, Jiekun Yang, Tianlong Chen

TL;DR

GRNFormer tackles the limitations of RNA foundation models by integrating multi-scale gene regulatory networks inferred from multi-omics data into pretraining. It presents a structure-aware fusion framework that combines adaptive cross-attention with a biology-guided edge perturbation strategy to balance sparse GRNs and enable effective knowledge transfer to RNA FMs. The method constructs cell-type-specific and single-cell GRNs via SCENIC+ and AUCell thresholds, fusing these priors with expression embeddings across multiple backbone architectures. Across gene perturbation prediction, cancer drug response, and single-cell drug response classification, GRNFormer achieves consistent improvements and yields interpretable attention patterns that align with known regulatory biology.

Abstract

Foundation models for single-cell RNA sequencing (scRNA-seq) have shown promising capabilities in capturing gene expression patterns. However, current approaches face critical limitations: they ignore biological prior knowledge encoded in gene regulatory relationships and fail to leverage multi-omics signals that could provide complementary regulatory insights. In this paper, we propose GRNFormer, a new framework that systematically integrates multi-scale Gene Regulatory Networks (GRNs) inferred from multi-omics data into RNA foundation model training. Our framework introduces two key innovations. First, we introduce a pipeline for constructing hierarchical GRNs that capture regulatory relationships at both cell-type-specific and cell-specific resolutions. Second, we design a structure-aware integration framework that addresses the information asymmetry in GRNs through two technical advances: (1) A graph topological adapter using multi-head cross-attention to weight regulatory relationships dynamically, and (2) a novel edge perturbation strategy that perturb GRNs with biologically-informed co-expression links to augment graph neural network training. Comprehensive experiments have been conducted on three representative downstream tasks across multiple model architectures to demonstrate the effectiveness of GRNFormer. It achieves consistent improvements over state-of-the-art (SoTA) baselines: $3.6\%$ increase in drug response prediction correlation, $9.6\%$ improvement in single-cell drug classification AUC, and $1.1\%$ average gain in gene perturbation prediction accuracy.

GRNFormer: A Biologically-Guided Framework for Integrating Gene Regulatory Networks into RNA Foundation Models

TL;DR

GRNFormer tackles the limitations of RNA foundation models by integrating multi-scale gene regulatory networks inferred from multi-omics data into pretraining. It presents a structure-aware fusion framework that combines adaptive cross-attention with a biology-guided edge perturbation strategy to balance sparse GRNs and enable effective knowledge transfer to RNA FMs. The method constructs cell-type-specific and single-cell GRNs via SCENIC+ and AUCell thresholds, fusing these priors with expression embeddings across multiple backbone architectures. Across gene perturbation prediction, cancer drug response, and single-cell drug response classification, GRNFormer achieves consistent improvements and yields interpretable attention patterns that align with known regulatory biology.

Abstract

Foundation models for single-cell RNA sequencing (scRNA-seq) have shown promising capabilities in capturing gene expression patterns. However, current approaches face critical limitations: they ignore biological prior knowledge encoded in gene regulatory relationships and fail to leverage multi-omics signals that could provide complementary regulatory insights. In this paper, we propose GRNFormer, a new framework that systematically integrates multi-scale Gene Regulatory Networks (GRNs) inferred from multi-omics data into RNA foundation model training. Our framework introduces two key innovations. First, we introduce a pipeline for constructing hierarchical GRNs that capture regulatory relationships at both cell-type-specific and cell-specific resolutions. Second, we design a structure-aware integration framework that addresses the information asymmetry in GRNs through two technical advances: (1) A graph topological adapter using multi-head cross-attention to weight regulatory relationships dynamically, and (2) a novel edge perturbation strategy that perturb GRNs with biologically-informed co-expression links to augment graph neural network training. Comprehensive experiments have been conducted on three representative downstream tasks across multiple model architectures to demonstrate the effectiveness of GRNFormer. It achieves consistent improvements over state-of-the-art (SoTA) baselines: increase in drug response prediction correlation, improvement in single-cell drug classification AUC, and average gain in gene perturbation prediction accuracy.

Paper Structure

This paper contains 19 sections, 4 equations, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: Gene regulatory process in scATAC-seq and scRNA-seq modalities. Image credit to bonev2024opportunities.
  • Figure 2: Overview of GRNFormer framework: (A) Multi-scale GRN construction from scATAC/scRNA-seq data utilizing additional Motif databases; (B) Our framework employs single-cell RNA foundation models (scRNA FMs) to encode gene expression profiles into expression embeddings, supporting three model architectures as backbones: scGPT, scFoundation, and scPaLM; (C) The multi-scale GRNs are perturbed using co-expression graphs and subsequently processed through GNN modules, with the resulting embeddings aggregated via summation to generate the structure embedding; (D) The expression embedding and structure embedding obtained from the previous two stages are fused through a cross-attention layer. The resulting hybrid embedding can be fed into the decoder for pretraining via masked language modeling objectives, or directly utilized for diverse downstream tasks.
  • Figure 3: Cancer drug response prediction evaluation.
  • Figure 4: Pairwise visualization of the Pearson correlation coefficient of scFoundation and scPaLM based on different grouping strategies. Left: grouping with respect to the cell lines; Middle: grouping with respect to the cancer type; Right: grouping with respect to the drug type. The red lines indicate the relationship of $y = x$.
  • Figure 5: (A) Distribution of average attention scores for transcription factor (TF) and non-transcription factor (non-TF) nodes; (B) Node degree distributions for these two types of nodes. TF nodes appear to connect to more genes and also exhibit higher attention weights.
  • ...and 1 more figures