Multimodal Alignment Improves Generalizability of Genomic Biomarker Prediction in Computational Pathology

Ekaterina Redekop; Eric Zimmermann; Ava P Amini; Alex X Lu; Neil Tenenholtz; James Brian Hall; Lorin Crawford; Kristen A Severson

Multimodal Alignment Improves Generalizability of Genomic Biomarker Prediction in Computational Pathology

Ekaterina Redekop, Eric Zimmermann, Ava P Amini, Alex X Lu, Neil Tenenholtz, James Brian Hall, Lorin Crawford, Kristen A Severson

TL;DR

MARBLE aligns histopathology-derived representations with representations of genomic biomarkers generated by a large language model (LLM) and a protein language model (PLM) to enable data-efficient generalization to novel, out-of-distribution biomarkers.

Abstract

Computational pathology models that use digitized histopathology whole-slide images have the potential to become a cost-effective and scalable alternative to molecular assays for the prediction of genomic biomarkers, a key task in precision oncology. However, as new genomic biomarkers are discovered or quantified, large, labeled datasets must be prospectively collected to train new models. To address this challenge, we developed MARBLE, a multimodal contrastive pretraining strategy that integrates structured biomarker knowledge into representation learning of histopathology images. MARBLE aligns histopathology-derived representations with representations of genomic biomarkers generated by a large language model (LLM) and a protein language model (PLM). This biologically informed alignment enables data-efficient generalization to novel, out-of-distribution biomarkers. Using the MSK-IMPACT cohort of over 40,000 patients across multiple biomarker panel versions, we design experiments grounded in real-world data to demonstrate the value of our proposed approach.

Multimodal Alignment Improves Generalizability of Genomic Biomarker Prediction in Computational Pathology

TL;DR

Abstract

Paper Structure (28 sections, 7 equations, 7 figures, 5 tables)

This paper contains 28 sections, 7 equations, 7 figures, 5 tables.

Introduction
Background and Related Work
Contrastive representation learning in computational pathology
Protein language models
Large Language Models
Methods
Embeddings
Aggregation
Multimodal alignment
Biomarkers prediction
Experiments
Dataset
MARBLE setup
Comparison approaches
Implementation details
...and 13 more sections

Figures (7)

Figure 1: (a) The molecular profiling and tissue collection pipeline in cancer care: a tissue specimen undergoes DNA sequencing, routine H&E staining, and digitalization. (b) Biomarker panel size across MSK-IMPACT versions (v3, v5, v6, v7). (c) Distribution of patient-level biomarker counts.
Figure 2: Overview of MARBLE. Aggregated histopathology embeddings from a frozen pathology foundation model (Path FM) are contrasted with aggregated biomarker embeddings derived from a frozen LLM (GPT-4o + Sentence-BERT (SBERT)) and/or a frozen PLM (ESM-2), enabling cross-modal alignment through contrastive pretraining.
Figure 3: (a) Supervised fine-tuning of the imaging encoder for multi-label biomarker classification. (b) Pretraining and fine-tuning cohort setups with panel-specific sample counts.
Figure 4: Performance comparison of multimodal pretraining across generalization setups and data regimes. Balanced accuracy (BA) of in-distribution and out-of-distribution biomarker prediction at three data regimes ($k$ = 100, 1000, 10000). (a) Results for Pretraining Scenario 1, using v3, v5, and v6 panels. (b) Results for Pretraining Scenario 2 using v3 and v5 panels.
Figure S1: Example of the biomarker-to-text transformation. For each biomarker defined by an oncogene, mutation, and cancer type, a canonical protein description is retrieved from UniProtKB and provided to an LLM (gpt-4o), which generates a concise text paragraph summarizing the mutation's role and the mechanism of action in the specified cancer. The resulting output (e.g., shown here for FGFR3 fusion in bladder urothelial carcinoma) is then embedded into a fixed-length vector representation using a text embedding model.
...and 2 more figures

Multimodal Alignment Improves Generalizability of Genomic Biomarker Prediction in Computational Pathology

TL;DR

Abstract

Multimodal Alignment Improves Generalizability of Genomic Biomarker Prediction in Computational Pathology

Authors

TL;DR

Abstract

Table of Contents

Figures (7)