Shapley Values-enabled Progressive Pseudo Bag Augmentation for Whole Slide Image Classification
Renao Yan, Qiehe Sun, Cheng Jin, Yiqing Liu, Yonghong He, Tian Guan, Hao Chen
TL;DR
This work tackles the weaknesses of attention-based instance importance scores in multiple-instance learning for whole-slide image classification by introducing Shapley value–based IIS estimation. It couples accelerated Shapley computation with a progressive pseudo bag augmentation framework (PMIL) and EM-based pseudo bag label optimization to improve both bag- and instance-level performance and interpretability. Across CAMELYON-16, BRACS, TCGA-LUNG, and TCGA-BRCA, PMIL demonstrates state-of-the-art results and provides class-wise interpretability through Shapley-based IIS. The approach offers more robust training, better localization of positive instances, and practical implications for clinical decision support in computational pathology.
Abstract
In computational pathology, whole-slide image (WSI) classification presents a formidable challenge due to its gigapixel resolution and limited fine-grained annotations. Multiple-instance learning (MIL) offers a weakly supervised solution, yet refining instance-level information from bag-level labels remains challenging. While most of the conventional MIL methods use attention scores to estimate instance importance scores (IIS) which contribute to the prediction of the slide labels, these often lead to skewed attention distributions and inaccuracies in identifying crucial instances. To address these issues, we propose a new approach inspired by cooperative game theory: employing Shapley values to assess each instance's contribution, thereby improving IIS estimation. The computation of the Shapley value is then accelerated using attention, meanwhile retaining the enhanced instance identification and prioritization. We further introduce a framework for the progressive assignment of pseudo bags based on estimated IIS, encouraging more balanced attention distributions in MIL models. Our extensive experiments on CAMELYON-16, BRACS, TCGA-LUNG, and TCGA-BRCA datasets show our method's superiority over existing state-of-the-art approaches, offering enhanced interpretability and class-wise insights. Our source code is available at https://github.com/RenaoYan/PMIL.
