Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification

Jiaxiang Gou; Luping Ji; Pei Liu; Mao Ye

Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification

Jiaxiang Gou, Luping Ji, Pei Liu, Mao Ye

TL;DR

This work tackles incremental whole-slide image classification by introducing QPMIL-VL, a Vision-Language framework with a queryable prototype MIL and a Class Feature Enhancement branch. The method uses a prototype pool of key-prompt pairs to guide bag-level feature extraction without replay buffers, and enhances class text features via ensemble prompts and a tunable vector, achieving state-of-the-art results on four TCGA WSI datasets. Key contributions include the first VL-based approach for incremental WSI, a memory-efficient prototype-guided aggregation mechanism, and extensive ablations demonstrating the impact of each component. The approach promises practical benefits for pathology workflows by enabling continual learning across evolving datasets with strong performance and without data replay.

Abstract

Whole Slide Image (WSI) classification has very significant applications in clinical pathology, e.g., tumor identification and cancer diagnosis. Currently, most research attention is focused on Multiple Instance Learning (MIL) using static datasets. One of the most obvious weaknesses of these methods is that they cannot efficiently preserve and utilize previously learned knowledge. With any new data arriving, classification models are required to be re-trained on both previous and current new data. To overcome this shortcoming and break through traditional vision modality, this paper proposes the first Vision-Language-based framework with Queryable Prototype Multiple Instance Learning (QPMIL-VL) specially designed for incremental WSI classification. This framework mainly consists of two information processing branches: one is for generating bag-level features by prototype-guided aggregation of instance features, while the other is for enhancing class features through a combination of class ensemble, tunable vector and class similarity loss. The experiments on four public WSI datasets demonstrate that our QPMIL-VL framework is effective for incremental WSI classification and often significantly outperforms other compared methods, achieving state-of-the-art (SOTA) performance. Our source code is publicly available at https://github.com/can-can-ya/QPMIL-VL.

Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification

TL;DR

Abstract

Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)