Table of Contents
Fetching ...

Boundary-Guided Learning for Gene Expression Prediction in Spatial Transcriptomics

Mingcheng Qu, Yuncong Wu, Donglin Di, Anyang Su, Tonghua Su, Yang Song, Lei Fan

TL;DR

The paper tackles the problem of predicting spatial gene expression from whole-slide images in spatial transcriptomics, addressing the limitation that prior methods often overlook boundary-based cellular morphology and microenvironment cues. It introduces BG-TRIPLEX, a three-branch architecture (spot, in-context, global) that integrates boundary information via Multi-Head Cross-Attention, using PiDiNet for edges and HoverNet for nuclei, with a global positional encoding (APEG) to capture tissue layout. The model is trained with a fused-output $MSE$ loss plus branch-guided losses, and shows notable improvements in $PCC$ across HER2ST, STNet, and Skin datasets, with demonstrated generalization to Visium data. These findings highlight boundary features as a key driver of accurate gene-expression prediction and offer a geometry-aware approach for pathology-informed transcriptomics analyses.

Abstract

Spatial transcriptomics (ST) has emerged as an advanced technology that provides spatial context to gene expression. Recently, deep learning-based methods have shown the capability to predict gene expression from WSI data using ST data. Existing approaches typically extract features from images and the neighboring regions using pretrained models, and then develop methods to fuse this information to generate the final output. However, these methods often fail to account for the cellular structure similarity, cellular density and the interactions within the microenvironment. In this paper, we propose a framework named BG-TRIPLEX, which leverages boundary information extracted from pathological images as guiding features to enhance gene expression prediction from WSIs. Specifically, our model consists of three branches: the spot, in-context and global branches. In the spot and in-context branches, boundary information, including edge and nuclei characteristics, is extracted using pretrained models. These boundary features guide the learning of cellular morphology and the characteristics of microenvironment through Multi-Head Cross-Attention. Finally, these features are integrated with global features to predict the final output. Extensive experiments were conducted on three public ST datasets. The results demonstrate that our BG-TRIPLEX consistently outperforms existing methods in terms of Pearson Correlation Coefficient (PCC). This method highlights the crucial role of boundary features in understanding the complex interactions between WSI and gene expression, offering a promising direction for future research.

Boundary-Guided Learning for Gene Expression Prediction in Spatial Transcriptomics

TL;DR

The paper tackles the problem of predicting spatial gene expression from whole-slide images in spatial transcriptomics, addressing the limitation that prior methods often overlook boundary-based cellular morphology and microenvironment cues. It introduces BG-TRIPLEX, a three-branch architecture (spot, in-context, global) that integrates boundary information via Multi-Head Cross-Attention, using PiDiNet for edges and HoverNet for nuclei, with a global positional encoding (APEG) to capture tissue layout. The model is trained with a fused-output loss plus branch-guided losses, and shows notable improvements in across HER2ST, STNet, and Skin datasets, with demonstrated generalization to Visium data. These findings highlight boundary features as a key driver of accurate gene-expression prediction and offer a geometry-aware approach for pathology-informed transcriptomics analyses.

Abstract

Spatial transcriptomics (ST) has emerged as an advanced technology that provides spatial context to gene expression. Recently, deep learning-based methods have shown the capability to predict gene expression from WSI data using ST data. Existing approaches typically extract features from images and the neighboring regions using pretrained models, and then develop methods to fuse this information to generate the final output. However, these methods often fail to account for the cellular structure similarity, cellular density and the interactions within the microenvironment. In this paper, we propose a framework named BG-TRIPLEX, which leverages boundary information extracted from pathological images as guiding features to enhance gene expression prediction from WSIs. Specifically, our model consists of three branches: the spot, in-context and global branches. In the spot and in-context branches, boundary information, including edge and nuclei characteristics, is extracted using pretrained models. These boundary features guide the learning of cellular morphology and the characteristics of microenvironment through Multi-Head Cross-Attention. Finally, these features are integrated with global features to predict the final output. Extensive experiments were conducted on three public ST datasets. The results demonstrate that our BG-TRIPLEX consistently outperforms existing methods in terms of Pearson Correlation Coefficient (PCC). This method highlights the crucial role of boundary features in understanding the complex interactions between WSI and gene expression, offering a promising direction for future research.

Paper Structure

This paper contains 15 sections, 12 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overview of our BG-TRIPLEX. Our method extracts boundary information to guide the capture of cellular morphology and the characteristics of both the target spot and its in-context regions in histology images. These features are then fused to predict gene expression levels.
  • Figure 2: The architecture of BG-TRIPLEX. Our model consists of three main branches: Spot Branch, In-context Branch and Global Branch. In the spot and in-context branches, boundary features are extracted from the target patch and its in-context regions using pretrained models $f_e$ and $f_n$. These boundary features guide the capture of cellular characteristics through Multi-Head Cross-Attention. Finally, features from all three branches are fused to predict the gene expression.
  • Figure 3: The Multi-Head Cross-Attention for boundary-guided learning. It involves generating the query $Q$ from the image features and the key $K$ and value $V$ from the boundary features. The boundary features guide the image features through cross-attention operations, after which the outputs are concatenated and passed through a LayerNorm layer.
  • Figure 4: Prediction visualizations on the STNet dataset. From left to right: raw WSI, label of gene expression, ST-Net, TRIPLEX and our BG-TRIPLEX.