stMCDI: Masked Conditional Diffusion Model with Graph Neural Network for Spatial Transcriptomics Data Imputation
Xiaoyu Li, Wenwen Min, Shunfang Wang, Changmiao Wang, Taosheng Xu
TL;DR
This work tackles the problem of extensive missing values in high-resolution spatial transcriptomics by leveraging spot spatial coordinates through a graph neural encoder and a masked self-supervised training regime. It introduces stMCDI, a masked conditional diffusion model that uses unmasked data as a priori conditioning and a cross-attention enhanced UNet to impute missing gene expressions while preserving the data distribution. The approach achieves state-of-the-art performance across six real ST datasets against fourteen baselines, with ablations confirming the value of the GNN encoder, masking strategy, and conditioning mechanism. The results highlight the practical potential of combining graph-based spatial encoding with conditional diffusion for accurate, distribution-preserving imputation in spatial omics, and point to future directions including multi-modal integration and downstream analysis improvements.
Abstract
Spatially resolved transcriptomics represents a significant advancement in single-cell analysis by offering both gene expression data and their corresponding physical locations. However, this high degree of spatial resolution entails a drawback, as the resulting spatial transcriptomic data at the cellular level is notably plagued by a high incidence of missing values. Furthermore, most existing imputation methods either overlook the spatial information between spots or compromise the overall gene expression data distribution. To address these challenges, our primary focus is on effectively utilizing the spatial location information within spatial transcriptomic data to impute missing values, while preserving the overall data distribution. We introduce \textbf{stMCDI}, a novel conditional diffusion model for spatial transcriptomics data imputation, which employs a denoising network trained using randomly masked data portions as guidance, with the unmasked data serving as conditions. Additionally, it utilizes a GNN encoder to integrate the spatial position information, thereby enhancing model performance. The results obtained from spatial transcriptomics datasets elucidate the performance of our methods relative to existing approaches.
