Table of Contents
Fetching ...

FEAST: Fully Connected Expressive Attention for Spatial Transcriptomics

Taejin Jeong, Joohyeok Kim, Jinyeong Kim, Chanyoung Kim, Seong Jae Hwang

Abstract

Spatial Transcriptomics (ST) provides spatially-resolved gene expression, offering crucial insights into tissue architecture and complex diseases. However, its prohibitive cost limits widespread adoption, leading to significant attention on inferring spatial gene expression from readily available whole slide images. While graph neural networks have been proposed to model interactions between tissue regions, their reliance on pre-defined sparse graphs prevents them from considering potentially interacting spot pairs, resulting in a structural limitation in capturing complex biological relationships. To address this, we propose FEAST (Fully connected Expressive Attention for Spatial Transcriptomics), an attention-based framework that models the tissue as a fully connected graph, enabling the consideration of all pairwise interactions. To better reflect biological interactions, we introduce negative-aware attention, which models both excitatory and inhibitory interactions, capturing essential negative relationships that standard attention often overlooks. Furthermore, to mitigate the information loss from truncated or ignored context in standard spot image extraction, we introduce an off-grid sampling strategy that gathers additional images from intermediate regions, allowing the model to capture a richer morphological context. Experiments on public ST datasets show that FEAST surpasses state-of-the-art methods in gene expression prediction while providing biologically plausible attention maps that clarify positive and negative interactions. Our code is available at https://github.com/starforTJ/ FEAST.

FEAST: Fully Connected Expressive Attention for Spatial Transcriptomics

Abstract

Spatial Transcriptomics (ST) provides spatially-resolved gene expression, offering crucial insights into tissue architecture and complex diseases. However, its prohibitive cost limits widespread adoption, leading to significant attention on inferring spatial gene expression from readily available whole slide images. While graph neural networks have been proposed to model interactions between tissue regions, their reliance on pre-defined sparse graphs prevents them from considering potentially interacting spot pairs, resulting in a structural limitation in capturing complex biological relationships. To address this, we propose FEAST (Fully connected Expressive Attention for Spatial Transcriptomics), an attention-based framework that models the tissue as a fully connected graph, enabling the consideration of all pairwise interactions. To better reflect biological interactions, we introduce negative-aware attention, which models both excitatory and inhibitory interactions, capturing essential negative relationships that standard attention often overlooks. Furthermore, to mitigate the information loss from truncated or ignored context in standard spot image extraction, we introduce an off-grid sampling strategy that gathers additional images from intermediate regions, allowing the model to capture a richer morphological context. Experiments on public ST datasets show that FEAST surpasses state-of-the-art methods in gene expression prediction while providing biologically plausible attention maps that clarify positive and negative interactions. Our code is available at https://github.com/starforTJ/ FEAST.

Paper Structure

This paper contains 24 sections, 5 equations, 14 figures, 10 tables.

Figures (14)

  • Figure 1: Conceptual overview of our proposed framework, illustrating how it addresses key challenges of prior ST methods. Existing approaches are limited by sparse graphs that miss potential interactions, standard positive-only relationships in attention that omit inhibitory interactions, and sparse information that leads to information loss in off-grid regions. Our framework overcomes these limitations by employing a fully connected graph to consider all-pairs interactions, introducing a novel negative-aware attention to explicitly model both positive and inhibitory relationships, and utilizing pseudo-spots to capture and leverage off-grid information.
  • Figure 2: The overall architecture of FEAST. The framework first extracts features and spot distance ($\mathbf{B}_h$) from the input WSI. The features are then processed through $L$ stacked hierarchical attention layers. Each layer consists of two stages: (1) a FEAST Block applied to local $k$-nearest neighbors (including pseudo-spots) to absorb spatial context, followed by (2) a second FEAST Block applied to the original spots for global self-attention. The final representations are passed to an MLP head for gene expression prediction.
  • Figure 3: Detailed architecture of the FEAST Block. This block, which serves as the core attention mechanism in our framework, explicitly models both positive (excitatory) and negative (inhibitory) relationships. The raw feature score ($qk^T$) and its inversion ($-qk^T$) are used to compute positive and negative scores, respectively, both incorporating the same positional bias ($\mathbf{B}_h$). After softmax normalization ($\sigma$), the final attention weight is computed by subtracting the negative weight from the positive weight. This results in an attention map that can capture both positive (red) and negative (blue) biological relationships.
  • Figure 4: Limitations of the standard fixed-size patch extraction and the motivation for our off-grid sampling. (a) Patch Gaps (Missed Context): Sparse sampling creates off-grid gaps between original patches, causing critical context to be ignored. (b) Truncated Morphological Structures: A single, coherent morphological structure is arbitrarily truncated by the grid boundaries. Our proposed pseudo-spots are sampled from these information-loss regions to restore the complete context.
  • Figure 5: Qualitative comparison of predicted gene expression heatmaps for FEAST and MERGE. Each row presents a side-by-side comparison for the cancer-relevant genes FASN and GNAS, showing the WSI, the ground truth gene expression, and the model predictions. The PCC scores in parentheses confirm that our model attains a higher correlation with the ground truth expressions.
  • ...and 9 more figures