Table of Contents
Fetching ...

CausalGeD: Blending Causality and Diffusion for Spatial Gene Expression Generation

Rabeya Tus Sadia, Md Atik Ahamed, Qiang Cheng

TL;DR

CausalGeD tackles the problem of integrating spatial transcriptomics with scRNA-seq by explicitly modeling gene-gene causal relationships. It introduces a diffusion-based generator augmented by a Causality-Aware Transformer (CAT) that blends autoregression with diffusion to capture regulatory dependencies without predefined networks. Across ten tissue datasets, CausalGeD achieves state-of-the-art performance (5–32% improvements in key metrics like PCC and SSIM) and preserves both global structure and local regulatory signals, aided by a two-headed encoder and a causally masked attention mechanism. This work advances practical spatial gene expression prediction and offers deeper biological insights into gene regulation within spatial contexts, with potential implications for understanding tissue organization and disease progression.

Abstract

The integration of single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) data is crucial for understanding gene expression in spatial context. Existing methods for such integration have limited performance, with structural similarity often below 60\%, We attribute this limitation to the failure to consider causal relationships between genes. We present CausalGeD, which combines diffusion and autoregressive processes to leverage these relationships. By generalizing the Causal Attention Transformer from image generation to gene expression data, our model captures regulatory mechanisms without predefined relationships. Across 10 tissue datasets, CausalGeD outperformed state-of-the-art baselines by 5- 32\% in key metrics, including Pearson's correlation and structural similarity, advancing both technical and biological insights.

CausalGeD: Blending Causality and Diffusion for Spatial Gene Expression Generation

TL;DR

CausalGeD tackles the problem of integrating spatial transcriptomics with scRNA-seq by explicitly modeling gene-gene causal relationships. It introduces a diffusion-based generator augmented by a Causality-Aware Transformer (CAT) that blends autoregression with diffusion to capture regulatory dependencies without predefined networks. Across ten tissue datasets, CausalGeD achieves state-of-the-art performance (5–32% improvements in key metrics like PCC and SSIM) and preserves both global structure and local regulatory signals, aided by a two-headed encoder and a causally masked attention mechanism. This work advances practical spatial gene expression prediction and offers deeper biological insights into gene regulation within spatial contexts, with potential implications for understanding tissue organization and disease progression.

Abstract

The integration of single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) data is crucial for understanding gene expression in spatial context. Existing methods for such integration have limited performance, with structural similarity often below 60\%, We attribute this limitation to the failure to consider causal relationships between genes. We present CausalGeD, which combines diffusion and autoregressive processes to leverage these relationships. By generalizing the Causal Attention Transformer from image generation to gene expression data, our model captures regulatory mechanisms without predefined relationships. Across 10 tissue datasets, CausalGeD outperformed state-of-the-art baselines by 5- 32\% in key metrics, including Pearson's correlation and structural similarity, advancing both technical and biological insights.

Paper Structure

This paper contains 28 sections, 4 equations, 10 figures, 3 tables, 2 algorithms.

Figures (10)

  • Figure 1: Top 5 gene pairs with strongest Granger causality relationships from randomly selected MC data (Section 4). All pairs show highly significant causal relationships (p-values < 1e-16), highlighting the importance of modeling gene-gene causality.
  • Figure 2: Architecture overview of CausalGeD. The framework comprises both training and inference processes, implemented through key components: latent space module, noisy input construction, diffusion-autoregression integration, causal attention mask used in causality-aware Transformer (CAT), and an inference module for processing noisy inputs.
  • Figure 3: Diffusion blended with autoregression.
  • Figure 4: Low-dimensional UMAP visualizations of predicted gene expression vs. real gene expression for the proposed method and the baseline best method. CausalGeD predictions (blue) closely match real gene expression data (orange) with minimal discrepancies across ten datasets, while the baseline method shows notable discrepancies.
  • Figure 5: Hierarchical clustering method to visualize the similarity between the predicted genes and the true gene labels in the form of a heat map.
  • ...and 5 more figures