Cross-modal Diffusion Modelling for Super-resolved Spatial Transcriptomics
Xiaofei Wang, Xingxu Huang, Stephen J. Price, Chao Li
TL;DR
This work addresses the low spatial resolution of spatial transcriptomics by proposing Diff-ST, a cross-modal conditional diffusion model that integrates histology images with gene expression to super-resolve ST maps. The method combines a multi-modal disentangling network with cross-modal adaptive modulation, dynamic cross-attention for cell-to-tissue information, and a co-expression intensity-based gene-correlation graph to jointly reconstruct multiple genes. Empirical results on Xenium, SGE, and Breast-ST show that Diff-ST markedly surpasses existing SR methods and generalizes well to external datasets, enabling more precise in silico ST maps for downstream discovery. The approach advances multi-modal diffusion for genomics-anchored tissue imaging and holds potential for improved spatial-genomic analyses in research and clinical contexts.
Abstract
The recent advancement of spatial transcriptomics (ST) allows to characterize spatial gene expression within tissue for discovery research. However, current ST platforms suffer from low resolution, hindering in-depth understanding of spatial gene expression. Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots. However, current super-resolution methods are limited by restoration uncertainty and mode collapse. Although diffusion models have shown promise in capturing complex interactions between multi-modal conditions, it remains a challenge to integrate histology images and gene expression for super-resolved ST maps. This paper proposes a cross-modal conditional diffusion model for super-resolving ST maps with the guidance of histology images. Specifically, we design a multi-modal disentangling network with cross-modal adaptive modulation to utilize complementary information from histology images and spatial gene expression. Moreover, we propose a dynamic cross-attention modelling strategy to extract hierarchical cell-to-tissue information from histology images. Lastly, we propose a co-expression-based gene-correlation graph network to model the co-expression relationship of multiple genes. Experiments show that our method outperforms other state-of-the-art methods in ST super-resolution on three public datasets.
