From Correlation to Causation: Max-Pooling-Based Multi-Instance Learning Leads to More Robust Whole Slide Image Classification
Xin Liu, Weijia Zhang, Wei Tang, Thuc Duy Le, Jiuyong Li, Lin Liu, Min-Ling Zhang
TL;DR
The paper analyzes attention-based MIL's vulnerability to spurious correlations in whole-slide image analysis and reframes max-pooling MIL through a causal lens. It demonstrates that, under mild assumptions, max-pooling can isolate causal content factors while ignoring environmental biases, but existing max-pooling models suffer from rote memorization and instability with hard instances. To address this, it introduces FocusMIL, which couples variational information bottleneck regularization with a multi-slide mini-batch training strategy to stabilize training and suppress memorization. Empirical results on real and semi-synthetic datasets show that FocusMIL achieves superior out-of-distribution generalization and more accurate instance-level tumor localization, validating the proposed causal perspective and practical robustness.
Abstract
In whole slide images (WSIs) analysis, attention-based multi-instance learning (MIL) models are susceptible to spurious correlations and degrade under domain shift. These methods may assign high attention weights to non-tumor regions, such as staining biases or artifacts, leading to unreliable tumor region localization. In this paper, we revisit max-pooling-based MIL methods from a causal perspective. Under mild assumptions, our theoretical results demonstrate that max-pooling encourages the model to focus on causal factors while ignoring bias-related factors. Furthermore, we discover that existing max-pooling-based methods may overfit the training set through rote memorization of instance features and fail to learn meaningful patterns. To address these issues, we propose FocusMIL, which couples max-pooling with an instance-level variational information bottleneck (VIB) to learn compact, predictive latent representations, and employs a multi-bag mini-batch scheme to stabilize optimization. We conduct comprehensive experiments on three real-world datasets and one semi-synthetic dataset. The results show that, by capturing causal factors, FocusMIL exhibits significant advantages in out-of-distribution scenarios and instance-level tumor region localization tasks.
