Table of Contents
Fetching ...

A Multi-scale Fused Graph Neural Network with Inter-view Contrastive Learning for Spatial Transcriptomics Data Clustering

Jianping Mei, Siqi Ai, Ye Yuan

TL;DR

The paper tackles spatial domain clustering in spatial transcriptomics by introducing stMFG, a multi-scale interactive fusion graph neural network that fuses spatial and gene-expression views after every graph convolution via layer-wise attention. It couples this fusion with cross-view contrastive learning, spatial regularization, and a ZINB-based reconstruction to learn discriminative yet spatially coherent embeddings. On DLPFC and breast cancer ST datasets, stMFG outperforms state-of-the-art methods, delivering ARI gains up to about 14% on challenging slices and showing robust performance across complex tissue structures. The approach advances spatial domain identification by enabling deep cross-view interactions and biologically informed regularization, with potential for broader application in tissue architecture studies.

Abstract

Spatial transcriptomics enables genome-wide expression analysis within native tissue context, yet identifying spatial domains remains challenging due to complex gene-spatial interactions. Existing methods typically process spatial and feature views separately, fusing only at output level - an "encode-separately, fuse-late" paradigm that limits multi-scale semantic capture and cross-view interaction. Accordingly, stMFG is proposed, a multi-scale interactive fusion graph network that introduces layer-wise cross-view attention to dynamically integrate spatial and gene features after each convolution. The model combines cross-view contrastive learning with spatial constraints to enhance discriminability while maintaining spatial continuity. On DLPFC and breast cancer datasets, stMFG outperforms state-of-the-art methods, achieving up to 14% ARI improvement on certain slices.

A Multi-scale Fused Graph Neural Network with Inter-view Contrastive Learning for Spatial Transcriptomics Data Clustering

TL;DR

The paper tackles spatial domain clustering in spatial transcriptomics by introducing stMFG, a multi-scale interactive fusion graph neural network that fuses spatial and gene-expression views after every graph convolution via layer-wise attention. It couples this fusion with cross-view contrastive learning, spatial regularization, and a ZINB-based reconstruction to learn discriminative yet spatially coherent embeddings. On DLPFC and breast cancer ST datasets, stMFG outperforms state-of-the-art methods, delivering ARI gains up to about 14% on challenging slices and showing robust performance across complex tissue structures. The approach advances spatial domain identification by enabling deep cross-view interactions and biologically informed regularization, with potential for broader application in tissue architecture studies.

Abstract

Spatial transcriptomics enables genome-wide expression analysis within native tissue context, yet identifying spatial domains remains challenging due to complex gene-spatial interactions. Existing methods typically process spatial and feature views separately, fusing only at output level - an "encode-separately, fuse-late" paradigm that limits multi-scale semantic capture and cross-view interaction. Accordingly, stMFG is proposed, a multi-scale interactive fusion graph network that introduces layer-wise cross-view attention to dynamically integrate spatial and gene features after each convolution. The model combines cross-view contrastive learning with spatial constraints to enhance discriminability while maintaining spatial continuity. On DLPFC and breast cancer datasets, stMFG outperforms state-of-the-art methods, achieving up to 14% ARI improvement on certain slices.

Paper Structure

This paper contains 15 sections, 10 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The framework of stMFG. By learning the multi-scale representations of Spaces and feature views through different GCN encoders, the attention mechanism is utilized to fuse each layer of embeddings between views, and the unified fusion is used as the input for the next layer of GCN of different views. Contrastive learning between views and spatial constraints achieve high-level discriminative feature learning and semantic alignment while maintaining organizational spatial continuity. The original data is reconstructed using the ZINB decoder, and the final complementary discriminant embedding is used for spatial domain recognition.
  • Figure 2: stMFG identifies spatial domains on the DLPFC dataset. (a) H$\&$S image for the 151672 slice, (b) Manual annotation. (c)-(f) Spatial domains are detected by SCANPY, Spatial-MGCN, MAFN and stMFG in 151672.
  • Figure 3: The results of hyperparameter sensitivity analysis on 151672