MsaMIL-Net: An End-to-End Multi-Scale Aware Multiple Instance Learning Network for Efficient Whole Slide Image Classification

Jiangping Wen; Jinyu Wen; Meie Fang

MsaMIL-Net: An End-to-End Multi-Scale Aware Multiple Instance Learning Network for Efficient Whole Slide Image Classification

Jiangping Wen, Jinyu Wen, Meie Fang

TL;DR

MsaMIL-Net tackles the bottleneck of end-to-end WSI classification by integrating semantic lesion filtering, multi-scale feature extraction, and cross-scale instance-aware fusion within a differentiable MIL framework. It enables joint optimization of feature extractors and MIL components across three native scales ($20\times$, $10\times$, $5\times$) using an end-to-end training strategy, improving ACC and AUC on DigestPath2019, BCNB, and UBC-OCEAN datasets. The key contributions are the SFFM for non-lesion interference reduction, the MSFEM for cross-scale semantic alignment, and the IAAM with MHE and DMQ for robust cross-scale aggregation, achieving state-of-the-art performance while reducing computational load via targeted lesion-area processing. Overall, the framework demonstrates that end-to-end, multi-scale MIL with efficient filtering and attention-based fusion can substantially enhance WSI classification in biomedical imaging with practical efficiency gains.

Abstract

Bag-based Multiple Instance Learning (MIL) approaches have emerged as the mainstream methodology for Whole Slide Image (WSI) classification. However, most existing methods adopt a segmented training strategy, which first extracts features using a pre-trained feature extractor and then aggregates these features through MIL. This segmented training approach leads to insufficient collaborative optimization between the feature extraction network and the MIL network, preventing end-to-end joint optimization and thereby limiting the overall performance of the model. Additionally, conventional methods typically extract features from all patches of fixed size, ignoring the multi-scale observation characteristics of pathologists. This not only results in significant computational resource waste when tumor regions represent a minimal proportion (as in the Camelyon16 dataset) but may also lead the model to suboptimal solutions. To address these limitations, this paper proposes an end-to-end multi-scale WSI classification framework that integrates multi-scale feature extraction with multiple instance learning. Specifically, our approach includes: (1) a semantic feature filtering module to reduce interference from non-lesion areas; (2) a multi-scale feature extraction module to capture pathological information at different levels; and (3) a multi-scale fusion MIL module for global modeling and feature integration. Through an end-to-end training strategy, we simultaneously optimize both the feature extractor and MIL network, ensuring maximum compatibility between them. Experiments were conducted on three cross-center datasets (DigestPath2019, BCNB, and UBC-OCEAN). Results demonstrate that our proposed method outperforms existing state-of-the-art approaches in terms of both accuracy (ACC) and AUC metrics.

MsaMIL-Net: An End-to-End Multi-Scale Aware Multiple Instance Learning Network for Efficient Whole Slide Image Classification

TL;DR

Abstract

MsaMIL-Net: An End-to-End Multi-Scale Aware Multiple Instance Learning Network for Efficient Whole Slide Image Classification

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)