Table of Contents
Fetching ...

SC-MIL: Sparsely Coded Multiple Instance Learning for Whole Slide Image Classification

Peijie Qiu, Pan Xiao, Wenhui Zhu, Yalin Wang, Aristeidis Sotiras

TL;DR

This work tackles weakly supervised WSI classification by jointly enhancing instance embeddings and cross-instance correlations through sparse dictionary learning (SDL). It unrolls ISTA-based sparse coding into a differentiable SC module that plugs into any MIL framework, learning a global over-complete dictionary, per-instance sparsity, and a learnable stepsize in an end-to-end manner. Empirical results across classical MIL benchmarks and WSI datasets (CAMELYON16, TCGA-NSCLC) show consistent improvements in accuracy and AUC, with notable gains in tumor localization and robust instance representations. The method demonstrates strong practical impact by delivering a training-efficient, modular enhancement that improves performance even when traditional self-supervised pretraining is limited, while preserving compatibility with diverse MIL aggregators.

Abstract

Multiple Instance Learning (MIL) has been widely used in weakly supervised whole slide image (WSI) classification. Typical MIL methods include a feature embedding part, which embeds the instances into features via a pre-trained feature extractor, and an MIL aggregator that combines instance embeddings into predictions. Most efforts have typically focused on improving these parts. This involves refining the feature embeddings through self-supervised pre-training as well as modeling the correlations between instances separately. In this paper, we proposed a sparsely coding MIL (SC-MIL) method that addresses those two aspects at the same time by leveraging sparse dictionary learning. The sparse dictionary learning captures the similarities of instances by expressing them as sparse linear combinations of atoms in an over-complete dictionary. In addition, imposing sparsity improves instance feature embeddings by suppressing irrelevant instances while retaining the most relevant ones. To make the conventional sparse coding algorithm compatible with deep learning, we unrolled it into a sparsely coded module leveraging deep unrolling. The proposed SC module can be incorporated into any existing MIL framework in a plug-and-play manner with an acceptable computational cost. The experimental results on multiple datasets demonstrated that the proposed SC module could substantially boost the performance of state-of-the-art MIL methods. The codes are available at \href{https://github.com/sotiraslab/SCMIL.git}{https://github.com/sotiraslab/SCMIL.git}.

SC-MIL: Sparsely Coded Multiple Instance Learning for Whole Slide Image Classification

TL;DR

This work tackles weakly supervised WSI classification by jointly enhancing instance embeddings and cross-instance correlations through sparse dictionary learning (SDL). It unrolls ISTA-based sparse coding into a differentiable SC module that plugs into any MIL framework, learning a global over-complete dictionary, per-instance sparsity, and a learnable stepsize in an end-to-end manner. Empirical results across classical MIL benchmarks and WSI datasets (CAMELYON16, TCGA-NSCLC) show consistent improvements in accuracy and AUC, with notable gains in tumor localization and robust instance representations. The method demonstrates strong practical impact by delivering a training-efficient, modular enhancement that improves performance even when traditional self-supervised pretraining is limited, while preserving compatibility with diverse MIL aggregators.

Abstract

Multiple Instance Learning (MIL) has been widely used in weakly supervised whole slide image (WSI) classification. Typical MIL methods include a feature embedding part, which embeds the instances into features via a pre-trained feature extractor, and an MIL aggregator that combines instance embeddings into predictions. Most efforts have typically focused on improving these parts. This involves refining the feature embeddings through self-supervised pre-training as well as modeling the correlations between instances separately. In this paper, we proposed a sparsely coding MIL (SC-MIL) method that addresses those two aspects at the same time by leveraging sparse dictionary learning. The sparse dictionary learning captures the similarities of instances by expressing them as sparse linear combinations of atoms in an over-complete dictionary. In addition, imposing sparsity improves instance feature embeddings by suppressing irrelevant instances while retaining the most relevant ones. To make the conventional sparse coding algorithm compatible with deep learning, we unrolled it into a sparsely coded module leveraging deep unrolling. The proposed SC module can be incorporated into any existing MIL framework in a plug-and-play manner with an acceptable computational cost. The experimental results on multiple datasets demonstrated that the proposed SC module could substantially boost the performance of state-of-the-art MIL methods. The codes are available at \href{https://github.com/sotiraslab/SCMIL.git}{https://github.com/sotiraslab/SCMIL.git}.
Paper Structure (32 sections, 6 equations, 5 figures, 4 tables)

This paper contains 32 sections, 6 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The workflow of the proposed SC-MIL framework in WSI classification. The sparse coding (SC) module conducts end-to-end unrolled sparse dictionary learning and can be easily integrated into any multiple instance learning (MIL) framework in a plug-and-play fashion.
  • Figure 2: The proposed SC module: (a) The unrolled ISTA learning scheme of the sparse dictionary learning; (b) The $\lambda$ learning module, which is implemented as a feed-forward network; (c) A single network layer of the unrolling network for sparse dictionary learning.
  • Figure 3: Comparison between sparse coding and low-rank projection (ILRA).
  • Figure 4: The tumor localization on the CAMELYON16 using ABMIL-Gated aggregator: (a) the attention map form ABMIL-Gated w/o SC, and (b) the attention map form ABMIL-Gated w SC. The red contours denote the ground-truth annotations of tumors. Each blue square represents the attention score for each WSI patch, where a brighter color signifies a higher attention score.
  • Figure 5: Visualization of the instance-level feature space using features extracted by a ResNet-18 on the CAMELYON16 testing set: (a) 512-dimensional features from after the first linear layer of a standard ABMIL; (b) 256-dimensional low-rank features from ILRA; (c) 256-dimensional sparse coefficients after performing the proposed SC module of an ABMIL.