DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification

Wenhui Zhu; Xiwen Chen; Peijie Qiu; Aristeidis Sotiras; Abolfazl Razi; Yalin Wang

DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification

Wenhui Zhu, Xiwen Chen, Peijie Qiu, Aristeidis Sotiras, Abolfazl Razi, Yalin Wang

TL;DR

DGR-MIL tackles the diversity of instances in whole slide image MIL by introducing a diverse global representation composed of learnable global vectors. It employs cross-attention between instance embeddings and the global vectors, augmented with a tokenized global vector to capture discriminative context. Two learning mechanisms—positive instance alignment and a determinantal point process–based diversity loss—enforce that the global vectors are both aligned with tumor-related patterns and diverse enough to cover heterogeneous appearances. End-to-end training with a combined loss yields state-of-the-art performance on CAMELYON16 and TCGA-lung datasets across multiple feature extractors, demonstrating the practical value of modeling diversity in WSI classification.

Abstract

Multiple instance learning (MIL) stands as a powerful approach in weakly supervised learning, regularly employed in histological whole slide image (WSI) classification for detecting tumorous lesions. However, existing mainstream MIL methods focus on modeling correlation between instances while overlooking the inherent diversity among instances. However, few MIL methods have aimed at diversity modeling, which empirically show inferior performance but with a high computational cost. To bridge this gap, we propose a novel MIL aggregation method based on diverse global representation (DGR-MIL), by modeling diversity among instances through a set of global vectors that serve as a summary of all instances. First, we turn the instance correlation into the similarity between instance embeddings and the predefined global vectors through a cross-attention mechanism. This stems from the fact that similar instance embeddings typically would result in a higher correlation with a certain global vector. Second, we propose two mechanisms to enforce the diversity among the global vectors to be more descriptive of the entire bag: (i) positive instance alignment and (ii) a novel, efficient, and theoretically guaranteed diversification learning paradigm. Specifically, the positive instance alignment module encourages the global vectors to align with the center of positive instances (e.g., instances containing tumors in WSI). To further diversify the global representations, we propose a novel diversification learning paradigm leveraging the determinantal point process. The proposed model outperforms the state-of-the-art MIL aggregation models by a substantial margin on the CAMELYON-16 and the TCGA-lung cancer datasets. The code is available at \url{https://github.com/ChongQingNoSubway/DGR-MIL}.

DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification

TL;DR

Abstract

Paper Structure (27 sections, 2 theorems, 18 equations, 6 figures, 7 tables)

This paper contains 27 sections, 2 theorems, 18 equations, 6 figures, 7 tables.

Introduction
Related Work
Multiple instance learning in WSIs
Transformer
Methods
Global Representation in MIL Pooling
Instance Correlation as Cross Attention.
Tokenized Global Vector.
Learning Diverse Global Representation
Positive Instance Alignment.
Diversity Learning.
Objective Function
Experiments and Results
Experimental Results
Ablation Studies
...and 12 more sections

Key Result

lemma thmcounterlemma

(kulesza2012determinantal) From a geometric perspective, the determinants in Eq.(eqn:dpp) can be interpreted as the squared $|A|$-dimensional volume spanned by its feature vectors:

Figures (6)

Figure 2: Overview of the proposed DGR-MIL where the global vectors are used for modeling the diversity of instances. The diverse global vectors are learned through the positive instance alignment module and the diversity learning mechanism.
Figure 3: The similarity matrix for the global vectors $\boldsymbol{G}$ learned from the CAMELYON16 dataset in two scenarios: (a) $\boldsymbol{G}$ is orthogonal and (b) $\boldsymbol{G}$ is non-orthogonal. To support Lemma \ref{['theorem:geo']} and Remark \ref{['remark:div_loss']}, we computed the area of the parallelogram corresponding to the two highly correlated global vectors. We omitted the diagonal elements in subpanel figure (b), as $\boldsymbol{L}_{ii}=1, \ \forall i\in [K]$.
Figure 4: Ablation studies on (a) number of non-tokenized global vectors on both CAMELYON16 and TCGA-NSCLC datasets, (b) and (c) balance parameter $\lambda_{tri}$ and $\lambda_{div}$ on CAMELYON16 dataset, respectively. (d) Comparison in the number of positive instances per bag.
Figure 5: Visualization of the attention map: (a) raw WSI with the ground-truth annotation, (b) the attention map computes using the tokenized global vectors, and (c-g) the attention map computes using the other ($K-1$) global vectors with $K=6$ in our experiment.
Figure 6: (a) Examples of positive instances of with-bag and between-bag diversities measured by rate-distortion theory. (b) Histogram of the diversity measure within positive bags on the CAMELYON16 dataset. (c) The between-bag distinction measures the pair-wise similarity between bags.
...and 1 more figures

Theorems & Definitions (4)

lemma thmcounterlemma
theorem thmcountertheorem
proof
remark thmcounterremark

DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification

TL;DR

Abstract

DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (4)