Table of Contents
Fetching ...

Shapley Values-enabled Progressive Pseudo Bag Augmentation for Whole Slide Image Classification

Renao Yan, Qiehe Sun, Cheng Jin, Yiqing Liu, Yonghong He, Tian Guan, Hao Chen

TL;DR

This work tackles the weaknesses of attention-based instance importance scores in multiple-instance learning for whole-slide image classification by introducing Shapley value–based IIS estimation. It couples accelerated Shapley computation with a progressive pseudo bag augmentation framework (PMIL) and EM-based pseudo bag label optimization to improve both bag- and instance-level performance and interpretability. Across CAMELYON-16, BRACS, TCGA-LUNG, and TCGA-BRCA, PMIL demonstrates state-of-the-art results and provides class-wise interpretability through Shapley-based IIS. The approach offers more robust training, better localization of positive instances, and practical implications for clinical decision support in computational pathology.

Abstract

In computational pathology, whole-slide image (WSI) classification presents a formidable challenge due to its gigapixel resolution and limited fine-grained annotations. Multiple-instance learning (MIL) offers a weakly supervised solution, yet refining instance-level information from bag-level labels remains challenging. While most of the conventional MIL methods use attention scores to estimate instance importance scores (IIS) which contribute to the prediction of the slide labels, these often lead to skewed attention distributions and inaccuracies in identifying crucial instances. To address these issues, we propose a new approach inspired by cooperative game theory: employing Shapley values to assess each instance's contribution, thereby improving IIS estimation. The computation of the Shapley value is then accelerated using attention, meanwhile retaining the enhanced instance identification and prioritization. We further introduce a framework for the progressive assignment of pseudo bags based on estimated IIS, encouraging more balanced attention distributions in MIL models. Our extensive experiments on CAMELYON-16, BRACS, TCGA-LUNG, and TCGA-BRCA datasets show our method's superiority over existing state-of-the-art approaches, offering enhanced interpretability and class-wise insights. Our source code is available at https://github.com/RenaoYan/PMIL.

Shapley Values-enabled Progressive Pseudo Bag Augmentation for Whole Slide Image Classification

TL;DR

This work tackles the weaknesses of attention-based instance importance scores in multiple-instance learning for whole-slide image classification by introducing Shapley value–based IIS estimation. It couples accelerated Shapley computation with a progressive pseudo bag augmentation framework (PMIL) and EM-based pseudo bag label optimization to improve both bag- and instance-level performance and interpretability. Across CAMELYON-16, BRACS, TCGA-LUNG, and TCGA-BRCA, PMIL demonstrates state-of-the-art results and provides class-wise interpretability through Shapley-based IIS. The approach offers more robust training, better localization of positive instances, and practical implications for clinical decision support in computational pathology.

Abstract

In computational pathology, whole-slide image (WSI) classification presents a formidable challenge due to its gigapixel resolution and limited fine-grained annotations. Multiple-instance learning (MIL) offers a weakly supervised solution, yet refining instance-level information from bag-level labels remains challenging. While most of the conventional MIL methods use attention scores to estimate instance importance scores (IIS) which contribute to the prediction of the slide labels, these often lead to skewed attention distributions and inaccuracies in identifying crucial instances. To address these issues, we propose a new approach inspired by cooperative game theory: employing Shapley values to assess each instance's contribution, thereby improving IIS estimation. The computation of the Shapley value is then accelerated using attention, meanwhile retaining the enhanced instance identification and prioritization. We further introduce a framework for the progressive assignment of pseudo bags based on estimated IIS, encouraging more balanced attention distributions in MIL models. Our extensive experiments on CAMELYON-16, BRACS, TCGA-LUNG, and TCGA-BRCA datasets show our method's superiority over existing state-of-the-art approaches, offering enhanced interpretability and class-wise insights. Our source code is available at https://github.com/RenaoYan/PMIL.
Paper Structure (22 sections, 11 equations, 6 figures, 5 tables)

This paper contains 22 sections, 11 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Observation of attention distributions and top 5 instances of one example slide in the CAMELYON-16 Dataset. (a)-(d) employs ABMIL, CLAM, DTFD, and proposed PMIL as the MIL model, respectively. In the column of "Attention Distribution", the patch index is normalized to a range of 0 to 1 for all patches across all slides in the left sub-figure. Notably, the distribution of attention scores is skewed, with a few instances accumulating a significant share. In the column of "Top 5 Instances", positive instances (depicted in green border) are not consistently ranked in order of attention scores, as negative instances (depicted in blue border) may take precedence in the queue.
  • Figure 2: Overview of the proposed PMIL framework. (a) A collection of patches, extracted from a WSI, is partitioned into $M$ ($M$ gradually increases) pseudo bags based on their estimated IIS, and then are trained in the same manner as regular bags. (b) The weights of the MIL model are frozen to estimate IIS, facilitating pseudo bag assignment. The number $M$ of pseudo bags progressively increases at the iteration when the MIL model converges in round 0, and initial pseudo bags are assigned using IIS estimated by the MIL model in the previous round. Note that the pseudo bag augmentation is only used during the training process.
  • Figure 3: Visualization of pseudo bag assignment using PMIL. The red annotations represent cancer regions. Our method can locate only three positive instances even in the micro metastasis case based on the ranking of Shapley values, and split them into pseudo bags evenly. The attention ranking reveals that more positive instances are noticed during training by accurate pseudo bag augmentation.
  • Figure 4: Heatmaps of two slide sub-fields using different models. (a) and (b) are macro and micro metastasis cases from CAMELYON-16, where red and green annotations are cancer and noncancer regions in the column of 'Ground Truth'.
  • Figure 5: Heatmaps of one malignant tumor (MT) case from BRACS with various IIS estimations. The blue and red annotations are the malignant tumor and atypical tumor (AT) regions in the 'Ground Truth'.
  • ...and 1 more figures