Table of Contents
Fetching ...

MAPLE: Multi-scale Attribute-enhanced Prompt Learning for Few-shot Whole Slide Image Classification

Junjie Zhou, Wei Shao, Yagao Yue, Wei Mu, Peng Wan, Qi Zhu, Daoqiang Zhang

TL;DR

MAPLE tackles few-shot WSI classification by marrying MIL with vision-language prompting through a hierarchical, multi-scale approach. It uses LLM-generated entity- and slide-level prompts, language-guided instance selection, and a cross-scale graph to fuse fine-grained histology with global slide context, producing entity- and slide-level predictions that are then combined. Ablation and visualization analyses validate the effectiveness and interpretability of the entity-level prompts and cross-scale reasoning, while experiments on TCGA cohorts demonstrate robust improvements over state-of-the-art MIL and prompt-based methods. By aligning with pathologists' diagnostic workflows and reducing annotation burden, MAPLE offers a practical, interpretable solution for pathology AI in the few-shot regime.

Abstract

Prompt learning has emerged as a promising paradigm for adapting pre-trained vision-language models (VLMs) to few-shot whole slide image (WSI) classification by aligning visual features with textual representations, thereby reducing annotation cost and enhancing model generalization. Nevertheless, existing methods typically rely on slide-level prompts and fail to capture the subtype-specific phenotypic variations of histological entities (\emph{e.g.,} nuclei, glands) that are critical for cancer diagnosis. To address this gap, we propose Multi-scale Attribute-enhanced Prompt Learning (\textbf{MAPLE}), a hierarchical framework for few-shot WSI classification that jointly integrates multi-scale visual semantics and performs prediction at both the entity and slide levels. Specifically, we first leverage large language models (LLMs) to generate entity-level prompts that can help identify multi-scale histological entities and their phenotypic attributes, as well as slide-level prompts to capture global visual descriptions. Then, an entity-guided cross-attention module is proposed to generate entity-level features, followed by aligning with their corresponding subtype-specific attributes for fine-grained entity-level prediction. To enrich entity representations, we further develop a cross-scale entity graph learning module that can update these representations by capturing their semantic correlations within and across scales. The refined representations are then aggregated into a slide-level representation and aligned with the corresponding prompts for slide-level prediction. Finally, we combine both entity-level and slide-level outputs to produce the final prediction results. Results on three cancer cohorts confirm the effectiveness of our approach in addressing few-shot pathology diagnosis tasks.

MAPLE: Multi-scale Attribute-enhanced Prompt Learning for Few-shot Whole Slide Image Classification

TL;DR

MAPLE tackles few-shot WSI classification by marrying MIL with vision-language prompting through a hierarchical, multi-scale approach. It uses LLM-generated entity- and slide-level prompts, language-guided instance selection, and a cross-scale graph to fuse fine-grained histology with global slide context, producing entity- and slide-level predictions that are then combined. Ablation and visualization analyses validate the effectiveness and interpretability of the entity-level prompts and cross-scale reasoning, while experiments on TCGA cohorts demonstrate robust improvements over state-of-the-art MIL and prompt-based methods. By aligning with pathologists' diagnostic workflows and reducing annotation burden, MAPLE offers a practical, interpretable solution for pathology AI in the few-shot regime.

Abstract

Prompt learning has emerged as a promising paradigm for adapting pre-trained vision-language models (VLMs) to few-shot whole slide image (WSI) classification by aligning visual features with textual representations, thereby reducing annotation cost and enhancing model generalization. Nevertheless, existing methods typically rely on slide-level prompts and fail to capture the subtype-specific phenotypic variations of histological entities (\emph{e.g.,} nuclei, glands) that are critical for cancer diagnosis. To address this gap, we propose Multi-scale Attribute-enhanced Prompt Learning (\textbf{MAPLE}), a hierarchical framework for few-shot WSI classification that jointly integrates multi-scale visual semantics and performs prediction at both the entity and slide levels. Specifically, we first leverage large language models (LLMs) to generate entity-level prompts that can help identify multi-scale histological entities and their phenotypic attributes, as well as slide-level prompts to capture global visual descriptions. Then, an entity-guided cross-attention module is proposed to generate entity-level features, followed by aligning with their corresponding subtype-specific attributes for fine-grained entity-level prediction. To enrich entity representations, we further develop a cross-scale entity graph learning module that can update these representations by capturing their semantic correlations within and across scales. The refined representations are then aggregated into a slide-level representation and aligned with the corresponding prompts for slide-level prediction. Finally, we combine both entity-level and slide-level outputs to produce the final prediction results. Results on three cancer cohorts confirm the effectiveness of our approach in addressing few-shot pathology diagnosis tasks.

Paper Structure

This paper contains 63 sections, 15 equations, 12 figures, 8 tables, 1 algorithm.

Figures (12)

  • Figure 1: Comparison of MAPLE with existing slide-level alignment methods for the classification of lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC). (a) Existing methods align slide-level features with corresponding prompts for classification. (b) Our proposed MAPLE introduces additional entity-level features and incorporates subtype-specific phenotypic attributes for more interpretable and precise alignment. For simplicity, only the single-scale data stream of MAPLE is visualized.
  • Figure 2: Framework of our proposed MAPLE. (a) MAPLE leverages the LLM to identify multi-scale histological entities, and then builds a cross-scale entity graph by modeling the semantic relationships wthin and across scales. (b) Both entity-level and slide-level prompts are enriched with learnable context vectors to enable effective alignment with corresponding visual features. (c) MAPLE jointly integrates multi-scale visual semantics and performs prediction at both the entity and slide levels.
  • Figure 3: t-SNE results of entity-level (a–c) and slide-level (d) embeddings on the TCGA-NSCLC dataset.
  • Figure 4: Visualization of entity-relevant patches selected by the entity-guided cross-attention module for lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) on the TCGA-NSCLC dataset. Top rows show patches and their corresponding entity attributes (e.g., stroma) at low resolution, while bottom rows show patches and their corresponding entity attributes (e.g., nucleoli) at high resolution.
  • Figure 5: Example of the constructed prompts at low resolution on the TCGA-NSCLC dataset.
  • ...and 7 more figures