Table of Contents
Fetching ...

Towards Unified Molecule-Enhanced Pathology Image Representation Learning via Integrating Spatial Transcriptomics

Minghao Han, Dingkang Yang, Jiabei Cheng, Xukun Zhang, Linhao Qu, Zizhi Chen, Lihua Zhang

TL;DR

UMPIRE addresses the limitation of image-text pretraining in computational pathology by fusing pathology images with spatial transcriptomics to produce molecule-aware representations. It introduces a two-stage pretraining pipeline where Visiumformer is trained on ViSTomics-4M ST data and then aligned with pathology image encoders via a symmetric contrastive objective, with an optional reconstruction objective and a query-reference expression predictor. Across six downstream tasks, UMPIRE consistently outperforms baselines and demonstrates strong cross-platform generalization, showcasing the value of integrating molecular data into foundational pathology models. The work provides a public codebase and pretrained weights, establishing a foundation for future molecule-enhanced computational pathology research.

Abstract

Recent advancements in multimodal pre-training models have significantly advanced computational pathology. However, current approaches predominantly rely on visual-language models, which may impose limitations from a molecular perspective and lead to performance bottlenecks. Here, we introduce a Unified Molecule-enhanced Pathology Image REpresentationn Learning framework (UMPIRE). UMPIRE aims to leverage complementary information from gene expression profiles to guide the multimodal pre-training, enhancing the molecular awareness of pathology image representation learning. We demonstrate that this molecular perspective provides a robust, task-agnostic training signal for learning pathology image embeddings. Due to the scarcity of paired data, approximately 4 million entries of spatial transcriptomics gene expression were collected to train the gene encoder. By leveraging powerful pre-trained encoders, UMPIRE aligns the encoders across over 697K pathology image-gene expression pairs. The performance of UMPIRE is demonstrated across various molecular-related downstream tasks, including gene expression prediction, spot classification, and mutation state prediction in whole slide images. Our findings highlight the effectiveness of multimodal data integration and open new avenues for exploring computational pathology enhanced by molecular perspectives. The code and pre-trained weights are available at https://github.com/Hanminghao/UMPIRE.

Towards Unified Molecule-Enhanced Pathology Image Representation Learning via Integrating Spatial Transcriptomics

TL;DR

UMPIRE addresses the limitation of image-text pretraining in computational pathology by fusing pathology images with spatial transcriptomics to produce molecule-aware representations. It introduces a two-stage pretraining pipeline where Visiumformer is trained on ViSTomics-4M ST data and then aligned with pathology image encoders via a symmetric contrastive objective, with an optional reconstruction objective and a query-reference expression predictor. Across six downstream tasks, UMPIRE consistently outperforms baselines and demonstrates strong cross-platform generalization, showcasing the value of integrating molecular data into foundational pathology models. The work provides a public codebase and pretrained weights, establishing a foundation for future molecule-enhanced computational pathology research.

Abstract

Recent advancements in multimodal pre-training models have significantly advanced computational pathology. However, current approaches predominantly rely on visual-language models, which may impose limitations from a molecular perspective and lead to performance bottlenecks. Here, we introduce a Unified Molecule-enhanced Pathology Image REpresentationn Learning framework (UMPIRE). UMPIRE aims to leverage complementary information from gene expression profiles to guide the multimodal pre-training, enhancing the molecular awareness of pathology image representation learning. We demonstrate that this molecular perspective provides a robust, task-agnostic training signal for learning pathology image embeddings. Due to the scarcity of paired data, approximately 4 million entries of spatial transcriptomics gene expression were collected to train the gene encoder. By leveraging powerful pre-trained encoders, UMPIRE aligns the encoders across over 697K pathology image-gene expression pairs. The performance of UMPIRE is demonstrated across various molecular-related downstream tasks, including gene expression prediction, spot classification, and mutation state prediction in whole slide images. Our findings highlight the effectiveness of multimodal data integration and open new avenues for exploring computational pathology enhanced by molecular perspectives. The code and pre-trained weights are available at https://github.com/Hanminghao/UMPIRE.

Paper Structure

This paper contains 35 sections, 11 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Overview of $\textsc{Umpire}$. First, approximately 4 million unlabeled spatial transcriptomics (ST) gene expression data were used to pre-train the Visiumformer for gene encoding. Next, a pre-trained pathologic Vision Transformer was adopted as the vision encoder. The symmetric contrastive loss $\mathcal{L}_{\text{SCL}}$ is applied to align embeddings from both modalities.
  • Figure 2: Evaluation of Downstream Tasks.$\textsc{Umpire}$ and baselines are assessed on: a. Bimodal gene expression prediction; b. Unimodal patch/spot classification; c. Vision-based WSI mutation state prediction.
  • Figure 3: Visualization of Bimodal Gene Expression Prediction. Ground truth and predicted spatially resolved expression levels for PIBF1 overlaying the whole slide image of sample patient-1-H2-5, visualized with a fixed (top) and a variable (bottom) color scale.
  • Figure 4: Visualization of Linear Probing. a. Whole Slide Image and Ground Truth; b. Predicted spot/patch types for sample 151673, visualized before (top) and after (bottom) multimodal pre-training with contrastive loss; c. with reconstruction loss.
  • Figure 5: MIL-based WSI Classification. Comparison of $\textsc{Umpire}$ and baselines for WSI-level gene mutation state classification using MIL. a. Based on Phikon. b. Based on UNI.
  • ...and 6 more figures