When Genes Speak: A Semantic-Guided Framework for Spatially Resolved Transcriptomics Data Clustering
Jiangkai Long, Yanran Zhu, Chang Tang, Kun Sun, Yuanyuan Liu, Xuesong Yan
TL;DR
SemST introduces a semantic-guided framework that couples LLM-derived gene semantics with graph-based spatial representations to improve clustering of spatial transcriptomics data. By building dual GCN views (spatial proximity and expression similarity) and applying the Fine-grained Semantic Modulation (FSM) that affine-modulates spatial features with spot-specific biological priors, SemST achieves state-of-the-art clustering across nine datasets. The approach is further strengthened by self-supervised ZINB reconstruction and auxiliary losses that preserve data structure while reducing cross-view redundancy. Importantly, the FSM module is plug-and-play, offering consistent gains when integrated into baseline models and highlighting the practical value of incorporating gene symbolism into spatial analyses. Overall, SemST advances biological interpretability and clustering performance by letting genes effectively “speak” within the tissue context.
Abstract
Spatial transcriptomics enables gene expression profiling with spatial context, offering unprecedented insights into the tissue microenvironment. However, most computational models treat genes as isolated numerical features, ignoring the rich biological semantics encoded in their symbols. This prevents a truly deep understanding of critical biological characteristics. To overcome this limitation, we present SemST, a semantic-guided deep learning framework for spatial transcriptomics data clustering. SemST leverages Large Language Models (LLMs) to enable genes to "speak" through their symbolic meanings, transforming gene sets within each tissue spot into biologically informed embeddings. These embeddings are then fused with the spatial neighborhood relationships captured by Graph Neural Networks (GNNs), achieving a coherent integration of biological function and spatial structure. We further introduce the Fine-grained Semantic Modulation (FSM) module to optimally exploit these biological priors. The FSM module learns spot-specific affine transformations that empower the semantic embeddings to perform an element-wise calibration of the spatial features, thus dynamically injecting high-order biological knowledge into the spatial context. Extensive experiments on public spatial transcriptomics datasets show that SemST achieves state-of-the-art clustering performance. Crucially, the FSM module exhibits plug-and-play versatility, consistently improving the performance when integrated into other baseline methods.
