Towards Unified Molecule-Enhanced Pathology Image Representation Learning via Integrating Spatial Transcriptomics
Minghao Han, Dingkang Yang, Jiabei Cheng, Xukun Zhang, Linhao Qu, Zizhi Chen, Lihua Zhang
TL;DR
UMPIRE addresses the limitation of image-text pretraining in computational pathology by fusing pathology images with spatial transcriptomics to produce molecule-aware representations. It introduces a two-stage pretraining pipeline where Visiumformer is trained on ViSTomics-4M ST data and then aligned with pathology image encoders via a symmetric contrastive objective, with an optional reconstruction objective and a query-reference expression predictor. Across six downstream tasks, UMPIRE consistently outperforms baselines and demonstrates strong cross-platform generalization, showcasing the value of integrating molecular data into foundational pathology models. The work provides a public codebase and pretrained weights, establishing a foundation for future molecule-enhanced computational pathology research.
Abstract
Recent advancements in multimodal pre-training models have significantly advanced computational pathology. However, current approaches predominantly rely on visual-language models, which may impose limitations from a molecular perspective and lead to performance bottlenecks. Here, we introduce a Unified Molecule-enhanced Pathology Image REpresentationn Learning framework (UMPIRE). UMPIRE aims to leverage complementary information from gene expression profiles to guide the multimodal pre-training, enhancing the molecular awareness of pathology image representation learning. We demonstrate that this molecular perspective provides a robust, task-agnostic training signal for learning pathology image embeddings. Due to the scarcity of paired data, approximately 4 million entries of spatial transcriptomics gene expression were collected to train the gene encoder. By leveraging powerful pre-trained encoders, UMPIRE aligns the encoders across over 697K pathology image-gene expression pairs. The performance of UMPIRE is demonstrated across various molecular-related downstream tasks, including gene expression prediction, spot classification, and mutation state prediction in whole slide images. Our findings highlight the effectiveness of multimodal data integration and open new avenues for exploring computational pathology enhanced by molecular perspectives. The code and pre-trained weights are available at https://github.com/Hanminghao/UMPIRE.
