Prompting Whole Slide Image Based Genetic Biomarker Prediction

Ling Zhang; Boxiang Yun; Xingran Xie; Qingli Li; Xinxing Li; Yan Wang

Prompting Whole Slide Image Based Genetic Biomarker Prediction

Ling Zhang, Boxiang Yun, Xingran Xie, Qingli Li, Xinxing Li, Yan Wang

TL;DR

The work tackles predicting genetic biomarkers (e.g., microsatellite instability MSI and BRAF mutations) from gigapixel whole-slide images (WSIs) of colorectal cancer. It proposes PromptBio, a three-module framework that uses large language model prompts to guide coarse-grained foreground selection, fine-grained component grouping into four pathology components, and their interaction mining with Transformer-based modeling. By leveraging LLM-generated pathology prompts and a coarse-to-fine strategy focused on cancer-associated stroma, PromptBio achieves strong MSI biomarker prediction with clear interpretability, demonstrated by MSI AUCs of 0.9149 on TCGA and 0.9125 on CPTAC, outperforming state-of-the-art MIL baselines. The approach, validated on two CRC cohorts and accompanied by publicly available code, suggests a practical path toward interpretable, prompt-guided WSI-based biomarker prediction in clinical settings.

Abstract

Prediction of genetic biomarkers, e.g., microsatellite instability and BRAF in colorectal cancer is crucial for clinical decision making. In this paper, we propose a whole slide image (WSI) based genetic biomarker prediction method via prompting techniques. Our work aims at addressing the following challenges: (1) extracting foreground instances related to genetic biomarkers from gigapixel WSIs, and (2) the interaction among the fine-grained pathological components in WSIs.Specifically, we leverage large language models to generate medical prompts that serve as prior knowledge in extracting instances associated with genetic biomarkers. We adopt a coarse-to-fine approach to mine biomarker information within the tumor microenvironment. This involves extracting instances related to genetic biomarkers using coarse medical prior knowledge, grouping pathology instances into fine-grained pathological components and mining their interactions. Experimental results on two colorectal cancer datasets show the superiority of our method, achieving 91.49% in AUC for MSI classification. The analysis further shows the clinical interpretability of our method. Code is publicly available at https://github.com/DeepMed-Lab-ECNU/PromptBio.

Prompting Whole Slide Image Based Genetic Biomarker Prediction

TL;DR

Abstract

Paper Structure (12 sections, 3 equations, 3 figures, 2 tables)

This paper contains 12 sections, 3 equations, 3 figures, 2 tables.

Introduction
Method
Coarse-grained Pathological Instance Selection
Prompt-Guided Fine-grained Pathological Component Grouping
Fine-grained Pathological Component Interaction Mining
Experiments
Experimental Setup
Comparison between PromptBio and Other Methods
Ablation Study
Conclusion
Acknowledgements
Disclosure of Interests.

Figures (3)

Figure 1: Illustration of PromptBio model. The overall framework consists of three parts: 1) coarse-grained pathological instance selection module, 2) prompt-guided fine-grained pathological component grouping module, 3) fine-grained pathological component interaction mining module. Given a dataset $\mathcal{D}$ consisting of $N$ pathology WSIs, our PromptBio first performs coarse-grained pathological instance selection to extract the instances belonging to cancer-associated stroma. Then our PromptBio performs fine-grained pathological component grouping on extracted instances. The grouping is guided by pathology text prompts of cancer-associated stroma with MSI. Finally, our PromptBio performs fine-grained pathological component interaction mining.
Figure 2: Attention and t-SNE visualizations. For (a) MSI cancer and (b) MSS cancer, we show attention about contribution of each patch in a pathology image to the class token, and visualization of similarity between each patch and pathological components. “Lym.”, “Infla.”,“Aty.” and “Irre.” respectively refer to “lymphatic infiltration”, “inflammatory response”, atypical lymphatic infiltration” and “irregular tumor infiltration boundaries”. (c) t-SNE visualization of bag representations in different methods.
Figure 3: Performance changes by varying the selection ratio $\beta$ on TCGA MSI dataset.

Prompting Whole Slide Image Based Genetic Biomarker Prediction

TL;DR

Abstract

Prompting Whole Slide Image Based Genetic Biomarker Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (3)