Table of Contents
Fetching ...

When Genes Speak: A Semantic-Guided Framework for Spatially Resolved Transcriptomics Data Clustering

Jiangkai Long, Yanran Zhu, Chang Tang, Kun Sun, Yuanyuan Liu, Xuesong Yan

TL;DR

SemST introduces a semantic-guided framework that couples LLM-derived gene semantics with graph-based spatial representations to improve clustering of spatial transcriptomics data. By building dual GCN views (spatial proximity and expression similarity) and applying the Fine-grained Semantic Modulation (FSM) that affine-modulates spatial features with spot-specific biological priors, SemST achieves state-of-the-art clustering across nine datasets. The approach is further strengthened by self-supervised ZINB reconstruction and auxiliary losses that preserve data structure while reducing cross-view redundancy. Importantly, the FSM module is plug-and-play, offering consistent gains when integrated into baseline models and highlighting the practical value of incorporating gene symbolism into spatial analyses. Overall, SemST advances biological interpretability and clustering performance by letting genes effectively “speak” within the tissue context.

Abstract

Spatial transcriptomics enables gene expression profiling with spatial context, offering unprecedented insights into the tissue microenvironment. However, most computational models treat genes as isolated numerical features, ignoring the rich biological semantics encoded in their symbols. This prevents a truly deep understanding of critical biological characteristics. To overcome this limitation, we present SemST, a semantic-guided deep learning framework for spatial transcriptomics data clustering. SemST leverages Large Language Models (LLMs) to enable genes to "speak" through their symbolic meanings, transforming gene sets within each tissue spot into biologically informed embeddings. These embeddings are then fused with the spatial neighborhood relationships captured by Graph Neural Networks (GNNs), achieving a coherent integration of biological function and spatial structure. We further introduce the Fine-grained Semantic Modulation (FSM) module to optimally exploit these biological priors. The FSM module learns spot-specific affine transformations that empower the semantic embeddings to perform an element-wise calibration of the spatial features, thus dynamically injecting high-order biological knowledge into the spatial context. Extensive experiments on public spatial transcriptomics datasets show that SemST achieves state-of-the-art clustering performance. Crucially, the FSM module exhibits plug-and-play versatility, consistently improving the performance when integrated into other baseline methods.

When Genes Speak: A Semantic-Guided Framework for Spatially Resolved Transcriptomics Data Clustering

TL;DR

SemST introduces a semantic-guided framework that couples LLM-derived gene semantics with graph-based spatial representations to improve clustering of spatial transcriptomics data. By building dual GCN views (spatial proximity and expression similarity) and applying the Fine-grained Semantic Modulation (FSM) that affine-modulates spatial features with spot-specific biological priors, SemST achieves state-of-the-art clustering across nine datasets. The approach is further strengthened by self-supervised ZINB reconstruction and auxiliary losses that preserve data structure while reducing cross-view redundancy. Importantly, the FSM module is plug-and-play, offering consistent gains when integrated into baseline models and highlighting the practical value of incorporating gene symbolism into spatial analyses. Overall, SemST advances biological interpretability and clustering performance by letting genes effectively “speak” within the tissue context.

Abstract

Spatial transcriptomics enables gene expression profiling with spatial context, offering unprecedented insights into the tissue microenvironment. However, most computational models treat genes as isolated numerical features, ignoring the rich biological semantics encoded in their symbols. This prevents a truly deep understanding of critical biological characteristics. To overcome this limitation, we present SemST, a semantic-guided deep learning framework for spatial transcriptomics data clustering. SemST leverages Large Language Models (LLMs) to enable genes to "speak" through their symbolic meanings, transforming gene sets within each tissue spot into biologically informed embeddings. These embeddings are then fused with the spatial neighborhood relationships captured by Graph Neural Networks (GNNs), achieving a coherent integration of biological function and spatial structure. We further introduce the Fine-grained Semantic Modulation (FSM) module to optimally exploit these biological priors. The FSM module learns spot-specific affine transformations that empower the semantic embeddings to perform an element-wise calibration of the spatial features, thus dynamically injecting high-order biological knowledge into the spatial context. Extensive experiments on public spatial transcriptomics datasets show that SemST achieves state-of-the-art clustering performance. Crucially, the FSM module exhibits plug-and-play versatility, consistently improving the performance when integrated into other baseline methods.

Paper Structure

This paper contains 37 sections, 14 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: The framework of the proposed SemST. We integrate three distinct information as inputs. Two graphs are constructed based on expression and spatial data, and multi-view features are propagated and fused using GCN. Gene symbols are formatted into prompts and fed into an LLM to extract biologically informed embeddings, which are then used by the proposed FSM module to guide spatial feature refinement. Model optimization is achieved through data reconstruction based on ZINB.
  • Figure 2: Visualization of manual annotations and clustering results produced by SemST and eight other methods on the DLPFC slice #151672, MBA, and HBC datasets. Color indicates spatial domains.
  • Figure 3: UMAP visualization of semantic embeddings generated by the LLM and BERT on DLPFC slice #151508 dataset, colored according to manual annotations.
  • Figure 4: Impact of $k_{g}$ (the number of top-expressed gene input to the LLM) on clustering performance across three datasets, evaluated by ARI, NMI, ACC, and F1.
  • Figure 5: Visualization of manual annotations and clustering results produced by SemST and eight other methods on the ME and MVC datasets. Color indicates spatial domains.
  • ...and 3 more figures