Table of Contents
Fetching ...

Revisiting Data Challenges of Computational Pathology: A Pack-based Multiple Instance Learning Training Framework

Wenhao Tang, Heng Fang, Ge Wu, Xiang Li, Ming-Ming Cheng

TL;DR

The paper tackles data challenges in computational pathology arising from extremely long and heterogeneous WSIs with limited supervision. It introduces PackMIL, a pack-based MIL training framework that forms fixed-length packs from variable-length feature sequences, uses a main branch for slide-level supervision and a residual hyperslide branch for inter-slide supervision, and employs an attention-driven downsampler to reduce redundancy. Task-specific hyperslide labels and losses enable multi-slide supervision across grading, subtyping, and survival analyses, while adaptive packing and robust masks preserve data heterogeneity. Empirical results show PackMIL delivers notable accuracy gains (up to 8%) and substantial training speedups (around 8x) on large-scale PANDA data, with consistent gains across tasks and encoders, highlighting the value of addressing data challenges in the FM era of CPath.

Abstract

Computational pathology (CPath) digitizes pathology slides into whole slide images (WSIs), enabling analysis for critical healthcare tasks such as cancer diagnosis and prognosis. However, WSIs possess extremely long sequence lengths (up to 200K), significant length variations (from 200 to 200K), and limited supervision. These extreme variations in sequence length lead to high data heterogeneity and redundancy. Conventional methods often compromise on training efficiency and optimization to preserve such heterogeneity under limited supervision. To comprehensively address these challenges, we propose a pack-based MIL framework. It packs multiple sampled, variable-length feature sequences into fixed-length ones, enabling batched training while preserving data heterogeneity. Moreover, we introduce a residual branch that composes discarded features from multiple slides into a hyperslide which is trained with tailored labels. It offers multi-slide supervision while mitigating feature loss from sampling. Meanwhile, an attention-driven downsampler is introduced to compress features in both branches to reduce redundancy. By alleviating these challenges, our approach achieves an accuracy improvement of up to 8% while using only 12% of the training time in the PANDA(UNI). Extensive experiments demonstrate that focusing data challenges in CPath holds significant potential in the era of foundation models. The code is https://github.com/FangHeng/PackMIL

Revisiting Data Challenges of Computational Pathology: A Pack-based Multiple Instance Learning Training Framework

TL;DR

The paper tackles data challenges in computational pathology arising from extremely long and heterogeneous WSIs with limited supervision. It introduces PackMIL, a pack-based MIL training framework that forms fixed-length packs from variable-length feature sequences, uses a main branch for slide-level supervision and a residual hyperslide branch for inter-slide supervision, and employs an attention-driven downsampler to reduce redundancy. Task-specific hyperslide labels and losses enable multi-slide supervision across grading, subtyping, and survival analyses, while adaptive packing and robust masks preserve data heterogeneity. Empirical results show PackMIL delivers notable accuracy gains (up to 8%) and substantial training speedups (around 8x) on large-scale PANDA data, with consistent gains across tasks and encoders, highlighting the value of addressing data challenges in the FM era of CPath.

Abstract

Computational pathology (CPath) digitizes pathology slides into whole slide images (WSIs), enabling analysis for critical healthcare tasks such as cancer diagnosis and prognosis. However, WSIs possess extremely long sequence lengths (up to 200K), significant length variations (from 200 to 200K), and limited supervision. These extreme variations in sequence length lead to high data heterogeneity and redundancy. Conventional methods often compromise on training efficiency and optimization to preserve such heterogeneity under limited supervision. To comprehensively address these challenges, we propose a pack-based MIL framework. It packs multiple sampled, variable-length feature sequences into fixed-length ones, enabling batched training while preserving data heterogeneity. Moreover, we introduce a residual branch that composes discarded features from multiple slides into a hyperslide which is trained with tailored labels. It offers multi-slide supervision while mitigating feature loss from sampling. Meanwhile, an attention-driven downsampler is introduced to compress features in both branches to reduce redundancy. By alleviating these challenges, our approach achieves an accuracy improvement of up to 8% while using only 12% of the training time in the PANDA(UNI). Extensive experiments demonstrate that focusing data challenges in CPath holds significant potential in the era of foundation models. The code is https://github.com/FangHeng/PackMIL

Paper Structure

This paper contains 37 sections, 18 equations, 6 figures, 22 tables.

Figures (6)

  • Figure 1: (a, b): WSIs present significant data challenges, including high heterogeneity stemming from highly variable sequence lengths and diverse morphology, massive data redundancy, and limited supervision. (c): Conventional methods train with batchsize of 1 to preserve data heterogeneity, suffering from training inefficiency and instability. (d): Our pack-based framework packs variable-length sequences to preserve scale information. It further introduces a residual branch to model inter-slide correlations, constructing a hyperslide that retains all morphological features and enrich limited supervision. This approach maintains data heterogeneity while enabling batched training.
  • Figure 2: Impact of data heterogeneity on CPath. RS represents random sampling instances in all bags to a fixed length while maintaining original label, thus losing data heterogeneity.
  • Figure 3: Left: Overview of proposed pack-based MIL training framework. Instance features from each WSI are sampled into kept and discarded sequences. Both sequences are processed by ADS. Downsampled sequences from different bags are then concatenated into fixed-length packs. This packing mechanism aggregates different WSIs into fixed-length packs, thereby enabling batched training. The dual-branch architecture, with shared weights, processes: 1) The Main Branch, supervised by slide-level labels, 2) The Residual Branch, supervised by Hyperslide labels. Right: Architecture of the Attention-driven Downsampler (ADS). Pseudo-codes are in Supplementary.
  • Figure 4: Illustration of Task-specific Hyperslide Labels. A hyperslide is a conceptual pack formed by aggregating multiple WSIs to enable inter-slide supervision. These task-specific labels are constructed based on the clinical characteristics of different clinical tasks.
  • Figure 5: Practical Guidelines for Batched CPath Training. Impact of training hyperparameters, highlighting dataset-scale-dependent strategies. (a) On the large-scale PANDA dataset, accuracy is non-monotonic with $bs$, degrading at too large values. (b) On the smaller TCGA dataset, computational resource limitations necessitate an empirically tuned trade-off between $bs$ and number of instances. (c) The $\sqrt{bs}$ learning rate scaling rule is effective for PANDA (bottom) but fails on TCGA (top), showing standard rules are not universal in CPath.
  • ...and 1 more figures