Table of Contents
Fetching ...

Beyond Independent Genes: Learning Module-Inductive Representations for Gene Perturbation Prediction

Jiafa Ruan, Ruijie Quan, Zongxin Yang, Liyang Xu, Yi Yang

Abstract

Predicting transcriptional responses to genetic perturbations is a central problem in functional genomics. In practice, perturbation responses are rarely gene-independent but instead manifest as coordinated, program-level transcriptional changes among functionally related genes. However, most existing methods do not explicitly model such coordination, due to gene-wise modeling paradigms and reliance on static biological priors that cannot capture dynamic program reorganization. To address these limitations, we propose scBIG, a module-inductive perturbation prediction framework that explicitly models coordinated gene programs. scBIG induces coherent gene programs from data via Gene-Relation Clustering, captures inter-program interactions through a Gene-Cluster-Aware Encoder, and preserves modular coordination using structure-aware alignment objectives. These structured representations are then modeled using conditional flow matching to enable flexible and generalizable perturbation prediction. Extensive experiments on multiple single-cell perturbation benchmarks show that scBIG consistently outperforms state-of-the-art methods, particularly on unseen and combinatorial perturbation settings, achieving an average improvement of 6.7% over the strongest baselines.

Beyond Independent Genes: Learning Module-Inductive Representations for Gene Perturbation Prediction

Abstract

Predicting transcriptional responses to genetic perturbations is a central problem in functional genomics. In practice, perturbation responses are rarely gene-independent but instead manifest as coordinated, program-level transcriptional changes among functionally related genes. However, most existing methods do not explicitly model such coordination, due to gene-wise modeling paradigms and reliance on static biological priors that cannot capture dynamic program reorganization. To address these limitations, we propose scBIG, a module-inductive perturbation prediction framework that explicitly models coordinated gene programs. scBIG induces coherent gene programs from data via Gene-Relation Clustering, captures inter-program interactions through a Gene-Cluster-Aware Encoder, and preserves modular coordination using structure-aware alignment objectives. These structured representations are then modeled using conditional flow matching to enable flexible and generalizable perturbation prediction. Extensive experiments on multiple single-cell perturbation benchmarks show that scBIG consistently outperforms state-of-the-art methods, particularly on unseen and combinatorial perturbation settings, achieving an average improvement of 6.7% over the strongest baselines.
Paper Structure (43 sections, 11 equations, 9 figures, 10 tables, 3 algorithms)

This paper contains 43 sections, 11 equations, 9 figures, 10 tables, 3 algorithms.

Figures (9)

  • Figure 1: (a): Comparison between gene-wise view and module view in perturbation responses. Black indicates the perturbed gene, red indicates upregulation, and blue indicates downregulation. (b): Quantitative comparison of our method and state-of-the-art approaches on the Norman additive split.
  • Figure 2: Overview of the scBIG framework.(§\ref{['sec_app:method']})(Left) Gene-Relation Clustering (GRC) partitions the unordered gene space into $K$ biologically coherent modules via optimal transport, integrating semantic embeddings from foundation models with high-confidence PPI priors. (Middle) Generative Backbone. The framework encodes cells using the Gene-Cluster-Aware Encoder (GCAE), which captures high-order inter-module interactions through a bottleneck attention mechanism with inducing points (Top). These latent representations guide a Conditional Flow Matching module (Bottom) to model the continuous transition from control ($z_0$) to perturbed ($z_1$) states. (Right) Structure-Aware Alignment. To ensure phenotypic fidelity, the entire pipeline is jointly optimized with two structural regularizers: Cluster Correlation Alignment, which preserves module-level co-expression patterns, and Pathway-informed Optimal Transport, which aligns predicted responses with canonical biological pathways.
  • Figure 3: Functional enrichment analysis of GRC gene clusters across two datasets(§\ref{['sec_app:biolog']}).
  • Figure 4: Differential attention ($\Delta$Attention) across gene modules for two representative perturbations in RPE1. dataset(§\ref{['sec_app:biolog']}).
  • Figure 5: Comparison of Different Gene Embeddings for Clustering Performance Evaluation(§\ref{['sec_app:cluster_embb']}).
  • ...and 4 more figures