Table of Contents
Fetching ...

SPADE: Spatial Transcriptomics and Pathology Alignment Using a Mixture of Data Experts for an Expressive Latent Space

Ekaterina Redekop, Mara Pleasure, Zichen Wang, Kimberly Flores, Anthony Sisk, William Speier, Corey W. Arnold

TL;DR

SPADE addresses the need to jointly model histology and spatial transcriptomics to capture molecular heterogeneity in tissue. It learns an ST-informed latent space through a mixture of data experts trained with a contrastive objective on paired H&E and Visium data, guided by a two-step clustering strategy that enables hard-negative mining across many organs. Evaluated on 20 downstream tasks, SPADE consistently outperforms baselines, including methods relying on bulk RNA-seq, and shows strong improvements in cancer subtyping, survival, and biomarker prediction, with interpretable attention heatmaps highlighting tumor-focused regions. The work demonstrates the value of multimodal supervision for robust pathology representations and provides a scalable framework that can extend to additional ST modalities and organ types.

Abstract

The rapid growth of digital pathology and advances in self-supervised deep learning have enabled the development of foundational models for various pathology tasks across diverse diseases. While multimodal approaches integrating diverse data sources have emerged, a critical gap remains in the comprehensive integration of whole-slide images (WSIs) with spatial transcriptomics (ST), which is crucial for capturing critical molecular heterogeneity beyond standard hematoxylin & eosin (H&E) staining. We introduce SPADE, a foundation model that integrates histopathology with ST data to guide image representation learning within a unified framework, in effect creating an ST-informed latent space. SPADE leverages a mixture-of-data experts technique, where experts are created via two-stage imaging feature-space clustering using contrastive learning to learn representations of co-registered WSI patches and gene expression profiles. Pre-trained on the comprehensive HEST-1k dataset, SPADE is evaluated on 20 downstream tasks, demonstrating significantly superior few-shot performance compared to baseline models, highlighting the benefits of integrating morphological and molecular information into one latent space. Code and pretrained weights are available at https://github.com/uclabair/SPADE.

SPADE: Spatial Transcriptomics and Pathology Alignment Using a Mixture of Data Experts for an Expressive Latent Space

TL;DR

SPADE addresses the need to jointly model histology and spatial transcriptomics to capture molecular heterogeneity in tissue. It learns an ST-informed latent space through a mixture of data experts trained with a contrastive objective on paired H&E and Visium data, guided by a two-step clustering strategy that enables hard-negative mining across many organs. Evaluated on 20 downstream tasks, SPADE consistently outperforms baselines, including methods relying on bulk RNA-seq, and shows strong improvements in cancer subtyping, survival, and biomarker prediction, with interpretable attention heatmaps highlighting tumor-focused regions. The work demonstrates the value of multimodal supervision for robust pathology representations and provides a scalable framework that can extend to additional ST modalities and organ types.

Abstract

The rapid growth of digital pathology and advances in self-supervised deep learning have enabled the development of foundational models for various pathology tasks across diverse diseases. While multimodal approaches integrating diverse data sources have emerged, a critical gap remains in the comprehensive integration of whole-slide images (WSIs) with spatial transcriptomics (ST), which is crucial for capturing critical molecular heterogeneity beyond standard hematoxylin & eosin (H&E) staining. We introduce SPADE, a foundation model that integrates histopathology with ST data to guide image representation learning within a unified framework, in effect creating an ST-informed latent space. SPADE leverages a mixture-of-data experts technique, where experts are created via two-stage imaging feature-space clustering using contrastive learning to learn representations of co-registered WSI patches and gene expression profiles. Pre-trained on the comprehensive HEST-1k dataset, SPADE is evaluated on 20 downstream tasks, demonstrating significantly superior few-shot performance compared to baseline models, highlighting the benefits of integrating morphological and molecular information into one latent space. Code and pretrained weights are available at https://github.com/uclabair/SPADE.

Paper Structure

This paper contains 31 sections, 9 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: Overview of SPADE workflow. a. A WSI is segmented and patched into a set of non-overlapping patches. A compressed feature for each patch is obtained through a pre-trained feature encoder. A corresponding gene expression vector is obtained after preprocessing. 2-step K-Means clustering is performed to create data experts across all WSIs in the dataset. b. For inference, three data experts' routing techniques are evaluated (Section.\ref{['sec:slide_repr']}).
  • Figure 2: Two-step clustering framework for data expert construction. Each WSI is divided into patches and encoded into patch-level features. Step 1: Within each tissue type, K-Means clustering is applied to obtain tissue-specific centroids. Step 2: The centroids across tissue types are clustered again to form a final unified set of $K$ clusters (data experts) used for data experts training.
  • Figure 3: Interpretability of SPADE. Attention scores for a prostate WSI positive for biochemical recurrence (BCR) are visualized as a heatmap for the UNI and SPADE models. The deeper the red color, the higher attention the model puts on that region of the tissue. The black outline shows the predicted tumor boundary from our internal cancer prediction model for whole-mount prostectomy WSIs. SPADE shows a higher concentration of attention within the tumor boundary compared to UNI, with some areas of focus along the tumor border. Sub-crops are selected to show the patterns with high attention at higher magnifications.
  • Figure 4: Interpretability of SPADE on TCGA-CRC BRAF Task. Attention scores for a colorectal cancer WSI are visualized as a heatmap for the UNI and SPADE models. The deeper the red color, the higher attention the model puts on that region of the tissue. Sub-crops are selected to show the patterns with high attention at higher magnifications. A pathologist noted differences between UNI and SPADE heatmaps (see Section \ref{['sec:crc_att']}). Cancer outline delineated by a pathologist in black.
  • Figure 5: Highest Attention Patches SPADE vs UNI. We used the attention scores to extract the 50 highest-scoring attention patches from our model, SPADE, and compared them to the 50 highest-scoring attention patches from UNI. We then provided the resulting plot to a pathologist for review, with model names withheld. The pathologist noted differences between UNI and SPADE highest attention patches (see Section \ref{['sec:crc_att']}).
  • ...and 3 more figures