A Multi-scale Fused Graph Neural Network with Inter-view Contrastive Learning for Spatial Transcriptomics Data Clustering
Jianping Mei, Siqi Ai, Ye Yuan
TL;DR
The paper tackles spatial domain clustering in spatial transcriptomics by introducing stMFG, a multi-scale interactive fusion graph neural network that fuses spatial and gene-expression views after every graph convolution via layer-wise attention. It couples this fusion with cross-view contrastive learning, spatial regularization, and a ZINB-based reconstruction to learn discriminative yet spatially coherent embeddings. On DLPFC and breast cancer ST datasets, stMFG outperforms state-of-the-art methods, delivering ARI gains up to about 14% on challenging slices and showing robust performance across complex tissue structures. The approach advances spatial domain identification by enabling deep cross-view interactions and biologically informed regularization, with potential for broader application in tissue architecture studies.
Abstract
Spatial transcriptomics enables genome-wide expression analysis within native tissue context, yet identifying spatial domains remains challenging due to complex gene-spatial interactions. Existing methods typically process spatial and feature views separately, fusing only at output level - an "encode-separately, fuse-late" paradigm that limits multi-scale semantic capture and cross-view interaction. Accordingly, stMFG is proposed, a multi-scale interactive fusion graph network that introduces layer-wise cross-view attention to dynamically integrate spatial and gene features after each convolution. The model combines cross-view contrastive learning with spatial constraints to enhance discriminability while maintaining spatial continuity. On DLPFC and breast cancer datasets, stMFG outperforms state-of-the-art methods, achieving up to 14% ARI improvement on certain slices.
