Revisiting Data Challenges of Computational Pathology: A Pack-based Multiple Instance Learning Training Framework
Wenhao Tang, Heng Fang, Ge Wu, Xiang Li, Ming-Ming Cheng
TL;DR
The paper tackles data challenges in computational pathology arising from extremely long and heterogeneous WSIs with limited supervision. It introduces PackMIL, a pack-based MIL training framework that forms fixed-length packs from variable-length feature sequences, uses a main branch for slide-level supervision and a residual hyperslide branch for inter-slide supervision, and employs an attention-driven downsampler to reduce redundancy. Task-specific hyperslide labels and losses enable multi-slide supervision across grading, subtyping, and survival analyses, while adaptive packing and robust masks preserve data heterogeneity. Empirical results show PackMIL delivers notable accuracy gains (up to 8%) and substantial training speedups (around 8x) on large-scale PANDA data, with consistent gains across tasks and encoders, highlighting the value of addressing data challenges in the FM era of CPath.
Abstract
Computational pathology (CPath) digitizes pathology slides into whole slide images (WSIs), enabling analysis for critical healthcare tasks such as cancer diagnosis and prognosis. However, WSIs possess extremely long sequence lengths (up to 200K), significant length variations (from 200 to 200K), and limited supervision. These extreme variations in sequence length lead to high data heterogeneity and redundancy. Conventional methods often compromise on training efficiency and optimization to preserve such heterogeneity under limited supervision. To comprehensively address these challenges, we propose a pack-based MIL framework. It packs multiple sampled, variable-length feature sequences into fixed-length ones, enabling batched training while preserving data heterogeneity. Moreover, we introduce a residual branch that composes discarded features from multiple slides into a hyperslide which is trained with tailored labels. It offers multi-slide supervision while mitigating feature loss from sampling. Meanwhile, an attention-driven downsampler is introduced to compress features in both branches to reduce redundancy. By alleviating these challenges, our approach achieves an accuracy improvement of up to 8% while using only 12% of the training time in the PANDA(UNI). Extensive experiments demonstrate that focusing data challenges in CPath holds significant potential in the era of foundation models. The code is https://github.com/FangHeng/PackMIL
