Table of Contents
Fetching ...

Cross-modal Diffusion Modelling for Super-resolved Spatial Transcriptomics

Xiaofei Wang, Xingxu Huang, Stephen J. Price, Chao Li

TL;DR

This work addresses the low spatial resolution of spatial transcriptomics by proposing Diff-ST, a cross-modal conditional diffusion model that integrates histology images with gene expression to super-resolve ST maps. The method combines a multi-modal disentangling network with cross-modal adaptive modulation, dynamic cross-attention for cell-to-tissue information, and a co-expression intensity-based gene-correlation graph to jointly reconstruct multiple genes. Empirical results on Xenium, SGE, and Breast-ST show that Diff-ST markedly surpasses existing SR methods and generalizes well to external datasets, enabling more precise in silico ST maps for downstream discovery. The approach advances multi-modal diffusion for genomics-anchored tissue imaging and holds potential for improved spatial-genomic analyses in research and clinical contexts.

Abstract

The recent advancement of spatial transcriptomics (ST) allows to characterize spatial gene expression within tissue for discovery research. However, current ST platforms suffer from low resolution, hindering in-depth understanding of spatial gene expression. Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots. However, current super-resolution methods are limited by restoration uncertainty and mode collapse. Although diffusion models have shown promise in capturing complex interactions between multi-modal conditions, it remains a challenge to integrate histology images and gene expression for super-resolved ST maps. This paper proposes a cross-modal conditional diffusion model for super-resolving ST maps with the guidance of histology images. Specifically, we design a multi-modal disentangling network with cross-modal adaptive modulation to utilize complementary information from histology images and spatial gene expression. Moreover, we propose a dynamic cross-attention modelling strategy to extract hierarchical cell-to-tissue information from histology images. Lastly, we propose a co-expression-based gene-correlation graph network to model the co-expression relationship of multiple genes. Experiments show that our method outperforms other state-of-the-art methods in ST super-resolution on three public datasets.

Cross-modal Diffusion Modelling for Super-resolved Spatial Transcriptomics

TL;DR

This work addresses the low spatial resolution of spatial transcriptomics by proposing Diff-ST, a cross-modal conditional diffusion model that integrates histology images with gene expression to super-resolve ST maps. The method combines a multi-modal disentangling network with cross-modal adaptive modulation, dynamic cross-attention for cell-to-tissue information, and a co-expression intensity-based gene-correlation graph to jointly reconstruct multiple genes. Empirical results on Xenium, SGE, and Breast-ST show that Diff-ST markedly surpasses existing SR methods and generalizes well to external datasets, enabling more precise in silico ST maps for downstream discovery. The approach advances multi-modal diffusion for genomics-anchored tissue imaging and holds potential for improved spatial-genomic analyses in research and clinical contexts.

Abstract

The recent advancement of spatial transcriptomics (ST) allows to characterize spatial gene expression within tissue for discovery research. However, current ST platforms suffer from low resolution, hindering in-depth understanding of spatial gene expression. Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots. However, current super-resolution methods are limited by restoration uncertainty and mode collapse. Although diffusion models have shown promise in capturing complex interactions between multi-modal conditions, it remains a challenge to integrate histology images and gene expression for super-resolved ST maps. This paper proposes a cross-modal conditional diffusion model for super-resolving ST maps with the guidance of histology images. Specifically, we design a multi-modal disentangling network with cross-modal adaptive modulation to utilize complementary information from histology images and spatial gene expression. Moreover, we propose a dynamic cross-attention modelling strategy to extract hierarchical cell-to-tissue information from histology images. Lastly, we propose a co-expression-based gene-correlation graph network to model the co-expression relationship of multiple genes. Experiments show that our method outperforms other state-of-the-art methods in ST super-resolution on three public datasets.
Paper Structure (10 sections, 5 equations, 5 figures, 3 tables)

This paper contains 10 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Conceptual workflow of Diff-ST. The forward diffusion process $q$ perturbs HR ST $\mathbf{x}$ by gradually adding Gaussian noise. The backward diffusion process $p$ denoises the perturbed ST, conditioning on its paired LR version $\mathbf{y}$ and histology image $\mathbf{h}$.
  • Figure 2: Left: Illustration of the multi-modal conditioned reverse diffusion process of Diff-ST. Right: Pipeline of cross-modal (histology-to-ST) adaptive modulation strategy. CL is curriculum learning, while FC denotes fully connected.
  • Figure 3: Visual comparisons at $5\times$ and $10\times$ scales on the Xenium dataset. The ST maps are overlayed on the paired histology image for better visualisation. Note that ANKRD30A, TPD52, GATA3 and SERPINA3 denote different genes.
  • Figure 4: Pipelines of CIGC-Graph network.
  • Figure 5: Additional visual comparisons at $5\times$ and $10\times$ scales on the Xenium dataset. The ST maps are overlayed on the paired histology image for better visualisation. Note that KRT7, ERBB2, POSTN, LUM, TACSTD2 and CLIC6 denote different genes.