Table of Contents
Fetching ...

GPO-VAE: Modeling Explainable Gene Perturbation Responses utilizing GRN-Aligned Parameter Optimization

Seungheun Baek, Soyon Park, Yan Ting Chok, Mogan Gim, Jaewoo Kang

TL;DR

GPO-VAE tackles the explainability gap in perturbation-response modeling by aligning the latent perturbation effects with a gene regulatory network via GRN-aligned parameter optimization. The model extends CRADLE-VAE with a square GRN-weight matrix $W$ and a K-hop accumulation $T_K$, guided by optimal-transport-based differential expression and sparsity regularization to yield interpretable GRNs while maintaining strong predictive accuracy. Across three Perturb-Seq datasets, GPO-VAE achieves state-of-the-art perturbation-response predictions and yields sparse, biologically meaningful GRNs that align with known regulatory pathways; it also demonstrates robust generalization to unseen perturbations. These results advance explainable biological AI by providing mechanistic, data-driven networks that support target discovery and interpretation in cellular perturbations, with open-source availability for reproducibility.

Abstract

Motivation: Predicting cellular responses to genetic perturbations is essential for understanding biological systems and developing targeted therapeutic strategies. While variational autoencoders (VAEs) have shown promise in modeling perturbation responses, their limited explainability poses a significant challenge, as the learned features often lack clear biological meaning. Nevertheless, model explainability is one of the most important aspects in the realm of biological AI. One of the most effective ways to achieve explainability is incorporating the concept of gene regulatory networks (GRNs) in designing deep learning models such as VAEs. GRNs elicit the underlying causal relationships between genes and are capable of explaining the transcriptional responses caused by genetic perturbation treatments. Results: We propose GPO-VAE, an explainable VAE enhanced by GRN-aligned Parameter Optimization that explicitly models gene regulatory networks in the latent space. Our key approach is to optimize the learnable parameters related to latent perturbation effects towards GRN-aligned explainability. Experimental results on perturbation prediction show our model achieves state-of-the-art performance in predicting transcriptional responses across multiple benchmark datasets. Furthermore, additional results on evaluating the GRN inference task reveal our model's ability to generate meaningful GRNs compared to other methods. According to qualitative analysis, GPO-VAE posseses the ability to construct biologically explainable GRNs that align with experimentally validated regulatory pathways. GPO-VAE is available at https://github.com/dmis-lab/GPO-VAE

GPO-VAE: Modeling Explainable Gene Perturbation Responses utilizing GRN-Aligned Parameter Optimization

TL;DR

GPO-VAE tackles the explainability gap in perturbation-response modeling by aligning the latent perturbation effects with a gene regulatory network via GRN-aligned parameter optimization. The model extends CRADLE-VAE with a square GRN-weight matrix and a K-hop accumulation , guided by optimal-transport-based differential expression and sparsity regularization to yield interpretable GRNs while maintaining strong predictive accuracy. Across three Perturb-Seq datasets, GPO-VAE achieves state-of-the-art perturbation-response predictions and yields sparse, biologically meaningful GRNs that align with known regulatory pathways; it also demonstrates robust generalization to unseen perturbations. These results advance explainable biological AI by providing mechanistic, data-driven networks that support target discovery and interpretation in cellular perturbations, with open-source availability for reproducibility.

Abstract

Motivation: Predicting cellular responses to genetic perturbations is essential for understanding biological systems and developing targeted therapeutic strategies. While variational autoencoders (VAEs) have shown promise in modeling perturbation responses, their limited explainability poses a significant challenge, as the learned features often lack clear biological meaning. Nevertheless, model explainability is one of the most important aspects in the realm of biological AI. One of the most effective ways to achieve explainability is incorporating the concept of gene regulatory networks (GRNs) in designing deep learning models such as VAEs. GRNs elicit the underlying causal relationships between genes and are capable of explaining the transcriptional responses caused by genetic perturbation treatments. Results: We propose GPO-VAE, an explainable VAE enhanced by GRN-aligned Parameter Optimization that explicitly models gene regulatory networks in the latent space. Our key approach is to optimize the learnable parameters related to latent perturbation effects towards GRN-aligned explainability. Experimental results on perturbation prediction show our model achieves state-of-the-art performance in predicting transcriptional responses across multiple benchmark datasets. Furthermore, additional results on evaluating the GRN inference task reveal our model's ability to generate meaningful GRNs compared to other methods. According to qualitative analysis, GPO-VAE posseses the ability to construct biologically explainable GRNs that align with experimentally validated regulatory pathways. GPO-VAE is available at https://github.com/dmis-lab/GPO-VAE

Paper Structure

This paper contains 51 sections, 18 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Model Overview of GPO-VAE. The model consists of three encoder modules: latent perturbation encoder, latent artifact encoder, and latent basal state encoder, and a decoder. The perturbation encoder utilizes a gene regulatory network.
  • Figure 2: Comparison of perturbation encoder between previous models and our model. Unlike previous models using randomly sampled sparse latent offsets with trainable parameters, our model utilizes GRN-aligned parameter optimization for explainability.
  • Figure 3: GRN topology analysis of GPO loss objective. Red borderlines separate perturbation and extended gene groups. The coloring scheme of the cells are based on the edge weights in each $\hat{W}$. White-colored, grey-colored and black-colored cells denote absence of edge (<0.5), initialized parameter (=0.5) and presence of edge (>0.5)
  • Figure 4: Case study and pathway analysis of GRN subnetworks involved in the interaction with three cancer proteins: KRAS, Myc, NTRK1. Solid lines indicate one-hop directed edges; dotted lines indicate 2- or 3-hop edges that form through the inclusion of extended genes.
  • Figure 5: Quantitative performance of the model on unseen perturbations for TWISTNB, RPL26, and RPL34 genes. For TWISTNB, the ATE Pearson correlation, ATE R², and Jaccard similarity are 0.792, 0.578, and 0.587, respectively. For RPL26, the corresponding values are 0.886, 0.775, and 0.724, and for RPL34, they are 0.919, 0.831, and 0.667.
  • ...and 1 more figures