Table of Contents
Fetching ...

Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification

Jiaxiang Gou, Luping Ji, Pei Liu, Mao Ye

TL;DR

This work tackles incremental whole-slide image classification by introducing QPMIL-VL, a Vision-Language framework with a queryable prototype MIL and a Class Feature Enhancement branch. The method uses a prototype pool of key-prompt pairs to guide bag-level feature extraction without replay buffers, and enhances class text features via ensemble prompts and a tunable vector, achieving state-of-the-art results on four TCGA WSI datasets. Key contributions include the first VL-based approach for incremental WSI, a memory-efficient prototype-guided aggregation mechanism, and extensive ablations demonstrating the impact of each component. The approach promises practical benefits for pathology workflows by enabling continual learning across evolving datasets with strong performance and without data replay.

Abstract

Whole Slide Image (WSI) classification has very significant applications in clinical pathology, e.g., tumor identification and cancer diagnosis. Currently, most research attention is focused on Multiple Instance Learning (MIL) using static datasets. One of the most obvious weaknesses of these methods is that they cannot efficiently preserve and utilize previously learned knowledge. With any new data arriving, classification models are required to be re-trained on both previous and current new data. To overcome this shortcoming and break through traditional vision modality, this paper proposes the first Vision-Language-based framework with Queryable Prototype Multiple Instance Learning (QPMIL-VL) specially designed for incremental WSI classification. This framework mainly consists of two information processing branches: one is for generating bag-level features by prototype-guided aggregation of instance features, while the other is for enhancing class features through a combination of class ensemble, tunable vector and class similarity loss. The experiments on four public WSI datasets demonstrate that our QPMIL-VL framework is effective for incremental WSI classification and often significantly outperforms other compared methods, achieving state-of-the-art (SOTA) performance. Our source code is publicly available at https://github.com/can-can-ya/QPMIL-VL.

Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification

TL;DR

This work tackles incremental whole-slide image classification by introducing QPMIL-VL, a Vision-Language framework with a queryable prototype MIL and a Class Feature Enhancement branch. The method uses a prototype pool of key-prompt pairs to guide bag-level feature extraction without replay buffers, and enhances class text features via ensemble prompts and a tunable vector, achieving state-of-the-art results on four TCGA WSI datasets. Key contributions include the first VL-based approach for incremental WSI, a memory-efficient prototype-guided aggregation mechanism, and extensive ablations demonstrating the impact of each component. The approach promises practical benefits for pathology workflows by enabling continual learning across evolving datasets with strong performance and without data replay.

Abstract

Whole Slide Image (WSI) classification has very significant applications in clinical pathology, e.g., tumor identification and cancer diagnosis. Currently, most research attention is focused on Multiple Instance Learning (MIL) using static datasets. One of the most obvious weaknesses of these methods is that they cannot efficiently preserve and utilize previously learned knowledge. With any new data arriving, classification models are required to be re-trained on both previous and current new data. To overcome this shortcoming and break through traditional vision modality, this paper proposes the first Vision-Language-based framework with Queryable Prototype Multiple Instance Learning (QPMIL-VL) specially designed for incremental WSI classification. This framework mainly consists of two information processing branches: one is for generating bag-level features by prototype-guided aggregation of instance features, while the other is for enhancing class features through a combination of class ensemble, tunable vector and class similarity loss. The experiments on four public WSI datasets demonstrate that our QPMIL-VL framework is effective for incremental WSI classification and often significantly outperforms other compared methods, achieving state-of-the-art (SOTA) performance. Our source code is publicly available at https://github.com/can-can-ya/QPMIL-VL.

Paper Structure

This paper contains 26 sections, 14 equations, 11 figures, 10 tables.

Figures (11)

  • Figure 1: (a) Existing visual modality framework for incremental WSI classification with buffer dependency; (b) our proposed Vision-Language-based framework with a queryable prototype pool and class feature enhancement.
  • Figure 2: The framework of QPMIL-VL. The prompts in the prototype pool enable an efficient incremental learning process by gradually capturing the visual feature descriptions of instance prototypes present in the sequential WSI datasets.
  • Figure 3: (a) ACC w.r.t length of each prompt ($L_{\boldsymbol{P}}$) and size of matching keys ($N$), size of prototype pool $M=20$; (b) ACC w.r.t $M$, $L_{\boldsymbol{P}}=24$ and $N=5$.
  • Figure 4: (a) Prototype key matching frequency histogram; (b) prototype feature visualization.
  • Figure 5: Class features visualization, only one class per dataset. Red $\star$ is the center, i.e., the average of features, for the instances of interest (orange points, determined by the cosine similarity between instance features and learned prototype features). Blue points are non-interest instance features. Green $\circ$ and $\bullet$ denote the cosine similarities before and after feature enhancement, respectively.
  • ...and 6 more figures