Dynamic Policy-Driven Adaptive Multi-Instance Learning for Whole Slide Image Classification

Tingting Zheng; Kui Jiang; Hongxun Yao

Dynamic Policy-Driven Adaptive Multi-Instance Learning for Whole Slide Image Classification

Tingting Zheng, Kui Jiang, Hongxun Yao

TL;DR

The dynamic instance sampling and reinforcement learning are integrated into a unified framework to improve the instance selection and feature aggregation, forming a novel Dynamic Policy Instance Selection (DPIS) scheme for better and more cred-ible decision-making.

Abstract

Multi-Instance Learning (MIL) has shown impressive performance for histopathology whole slide image (WSI) analysis using bags or pseudo-bags. It involves instance sampling, feature representation, and decision-making. However, existing MIL-based technologies at least suffer from one or more of the following problems: 1) requiring high storage and intensive pre-processing for numerous instances (sampling); 2) potential over-fitting with limited knowledge to predict bag labels (feature representation); 3) pseudo-bag counts and prior biases affect model robustness and generalizability (decision-making). Inspired by clinical diagnostics, using the past sampling instances can facilitate the final WSI analysis, but it is barely explored in prior technologies. To break free these limitations, we integrate the dynamic instance sampling and reinforcement learning into a unified framework to improve the instance selection and feature aggregation, forming a novel Dynamic Policy Instance Selection (DPIS) scheme for better and more credible decision-making. Specifically, the measurement of feature distance and reward function are employed to boost continuous instance sampling. To alleviate the over-fitting, we explore the latent global relations among instances for more robust and discriminative feature representation while establishing reward and punishment mechanisms to correct biases in pseudo-bags using contrastive learning. These strategies form the final Dynamic Policy-Driven Adaptive Multi-Instance Learning (PAMIL) method for WSI tasks. Extensive experiments reveal that our PAMIL method outperforms the state-of-the-art by 3.8\% on CAMELYON16 and 4.4\% on TCGA lung cancer datasets.

Dynamic Policy-Driven Adaptive Multi-Instance Learning for Whole Slide Image Classification

TL;DR

Abstract

Paper Structure (32 sections, 9 equations, 6 figures, 4 tables)

This paper contains 32 sections, 9 equations, 6 figures, 4 tables.

Introduction
Related Work
Multi-Instance Learning in WSI Classification
Reinforcement Learning in Medical Imaging
Proposed Method
Review for MIL Formulation
Overview for PAMIL
Dynamic Policy Instance Selection Scheme
Selection Fusion Feature Representation
Decision-Making and Feedback
Decision-Making.
Reward and Punishment.
Optimization Method
Instance Sampling Scheme Optimization.
Feature Representation and Decision-Making Optimization.
...and 17 more sections

Figures (6)

Figure 1: Different schemes for Multi-Instance Learning (MIL). (a) Each bag is treated as a group. (b) Each bag is divided into pseudo-bags. (c) The proposed Dynamic Policy-Driven Adaptive Multi-Instance Learning (PAMIL) Method.
Figure 2: Architecture of our proposed dynamic policy-driven adaptive multi-instance learning (PAMIL) framework. PAMIL consists of a dynamic policy-driven instance selection (DPIS) scheme, a selection fusion feature representation (SFFR) module, and a Transformer-based classification module (TCM). DPIS learns relationships among past, current, and remaining instances of $X_i$, and samples instances embedded as $v_i^{s_t}$ using a feature extractor (FE). Then, SFFR takes $v_i^{s_t}$ and an initial token $u_i^{s_t}$ as inputs, refining $u_i^{s_t}$ by utilizing a Transformer module (TRM) and a multi-head attention (MHA) mechanism to fuse $v_i^{s_t}$ and past tokens. Meanwhile, we introduce a Siamese (SIA) structure between $u_i^{s_t}$ and $u_i^{s_{t-1}}$ to enhance the robustness of $u_i^{s_t}$. Finally, TCM uses a class token (CLS) $h_i^{cls}$ to aggregate $\{u_{i}^{s_t}\}_{t=1}^{T}$ for inferring the probability ${\hat{Y}_i}$ of $X_i$, providing feedback as reward $r_{i}^*$ and penalty $r_i^p$ to the DPIS.
Figure 3: Illustration of the proposed selection fusion feature representation (SFFR) module, proximal policy module (PPM) and Transformer-based classification module (TCM) in PAMIL.
Figure 4: Visual comparisons with pseudo-bags-level methods on CAMELYON16 dataset. The blue outline indicates the tumor region. Attention score bar (0-1): indicates model focus on tumor instances, higher values show more attention, not positive probability.
Figure 5: Visual effects of pseudo-bag sizes and grouping schemes on CAMELYON16 dataset.
...and 1 more figures

Dynamic Policy-Driven Adaptive Multi-Instance Learning for Whole Slide Image Classification

TL;DR

Abstract

Dynamic Policy-Driven Adaptive Multi-Instance Learning for Whole Slide Image Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (6)