The Rise of AI Language Pathologists: Exploring Two-level Prompt Learning for Few-shot Weakly-supervised Whole Slide Image Classification
Linhao Qu, Xiaoyuan Luo, Kexue Fu, Manning Wang, Zhijian Song
TL;DR
This work addresses the problem of Few-shot Weakly Supervised WSI Classification (FSWC) by proposing TOP, a two-level prompt learning MIL framework that fuses GPT-4-derived language priors with CLIP-based vision-language representations. By introducing instance prompt guided pooling and a bag-level prompt group, TOP effectively aggregates patch-level features into robust bag representations and supports few-shot learning under MIL constraints. Empirical results on Camelyon16, TCGA-Lung, and a Cervical cancer WSI dataset show state-of-the-art bag- and instance-level performance across 1–16 labeled bags per class, with substantial improvements over strong baselines. The approach highlights the potential of language-prior-informed prompts to boost pathology WSI classification in data-scarce scenarios, offering a practical path toward scalable AI-assisted pathology.
Abstract
This paper introduces the novel concept of few-shot weakly supervised learning for pathology Whole Slide Image (WSI) classification, denoted as FSWC. A solution is proposed based on prompt learning and the utilization of a large language model, GPT-4. Since a WSI is too large and needs to be divided into patches for processing, WSI classification is commonly approached as a Multiple Instance Learning (MIL) problem. In this context, each WSI is considered a bag, and the obtained patches are treated as instances. The objective of FSWC is to classify both bags and instances with only a limited number of labeled bags. Unlike conventional few-shot learning problems, FSWC poses additional challenges due to its weak bag labels within the MIL framework. Drawing inspiration from the recent achievements of vision-language models (V-L models) in downstream few-shot classification tasks, we propose a two-level prompt learning MIL framework tailored for pathology, incorporating language prior knowledge. Specifically, we leverage CLIP to extract instance features for each patch, and introduce a prompt-guided pooling strategy to aggregate these instance features into a bag feature. Subsequently, we employ a small number of labeled bags to facilitate few-shot prompt learning based on the bag features. Our approach incorporates the utilization of GPT-4 in a question-and-answer mode to obtain language prior knowledge at both the instance and bag levels, which are then integrated into the instance and bag level language prompts. Additionally, a learnable component of the language prompts is trained using the available few-shot labeled data. We conduct extensive experiments on three real WSI datasets encompassing breast cancer, lung cancer, and cervical cancer, demonstrating the notable performance of the proposed method in bag and instance classification. All codes will be available.
