Pathology-knowledge Enhanced Multi-instance Prompt Learning for Few-shot Whole Slide Image Classification

Linhao Qu; Dingkang Yang; Dan Huang; Qinhao Guo; Rongkui Luo; Shaoting Zhang; Xiaosong Wang

Pathology-knowledge Enhanced Multi-instance Prompt Learning for Few-shot Whole Slide Image Classification

Linhao Qu, Dingkang Yang, Dan Huang, Qinhao Guo, Rongkui Luo, Shaoting Zhang, Xiaosong Wang

TL;DR

The paper tackles few-shot weakly supervised WSI classification in pathology, where data are scarce due to privacy and disease rarity. It introduces PEMP, a pathology-knowledge enhanced multi-instance prompt learning framework that injects visual and textual priors at patch and slide levels through a mix of static and learnable prompts, supported by lightweight Messenger and Summary layers. Three learning streams—visual prompts, textual prompts, and two-level prompt alignment—are coupled with alignment losses to fuse vision and language representations within a frozen CLIP backbone. Across three clinical tasks covering five tumor types, PEMP yields superior performance in few-shot regimes, with interpretable results demonstrated by retrieved pathology patterns and consistent gains over state-of-the-art methods.

Abstract

Current multi-instance learning algorithms for pathology image analysis often require a substantial number of Whole Slide Images for effective training but exhibit suboptimal performance in scenarios with limited learning data. In clinical settings, restricted access to pathology slides is inevitable due to patient privacy concerns and the prevalence of rare or emerging diseases. The emergence of the Few-shot Weakly Supervised WSI Classification accommodates the significant challenge of the limited slide data and sparse slide-level labels for diagnosis. Prompt learning based on the pre-trained models (\eg, CLIP) appears to be a promising scheme for this setting; however, current research in this area is limited, and existing algorithms often focus solely on patch-level prompts or confine themselves to language prompts. This paper proposes a multi-instance prompt learning framework enhanced with pathology knowledge, \ie, integrating visual and textual prior knowledge into prompts at both patch and slide levels. The training process employs a combination of static and learnable prompts, effectively guiding the activation of pre-trained models and further facilitating the diagnosis of key pathology patterns. Lightweight Messenger (self-attention) and Summary (attention-pooling) layers are introduced to model relationships between patches and slides within the same patient data. Additionally, alignment-wise contrastive losses ensure the feature-level alignment between visual and textual learnable prompts for both patches and slides. Our method demonstrates superior performance in three challenging clinical tasks, significantly outperforming comparative few-shot methods.

Pathology-knowledge Enhanced Multi-instance Prompt Learning for Few-shot Whole Slide Image Classification

TL;DR

Abstract

Paper Structure (16 sections, 5 equations, 4 figures, 4 tables)

This paper contains 16 sections, 5 equations, 4 figures, 4 tables.

Introduction
Related Work
MIL for WSI Classification
VLM-Based Prompt Learning
FSWC Baseline Framework
Problem Formulation
Attention Aggregation for WSI Classification
VLM backbone and Few-shot Prompt Learning
Knowledge-enhanced Prompt Learning
Overview
Visual Prompt Learning
Textual Prompt Learning
Alignment of Visual and Textual Prompts
Experiment and Result
Results
...and 1 more sections

Figures (4)

Figure 1: Existing methods such as CoOp 56 do not fully consider task-related pathology visual features and its association with specific terms in FSWC. We utilize aligned task-specific image examples and language descriptions to enhance visual and textual prompt learning at both patch and slide levels.
Figure 2: Overview of PEMP, where flames represent optimized parameters, and snowflakes indicate frozen parameters during training.
Figure 3: (A) and (B) examples of visual and textual token constructions related to the prognosis of early cervical cancer patients.
Figure 4: Visualization of key pathology patterns indicating both good and poor prognosis retrieved by PEMP from the test set.

Pathology-knowledge Enhanced Multi-instance Prompt Learning for Few-shot Whole Slide Image Classification

TL;DR

Abstract

Pathology-knowledge Enhanced Multi-instance Prompt Learning for Few-shot Whole Slide Image Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (4)