"No negatives needed": weakly-supervised regression for interpretable tumor detection in whole-slide histopathology images
Marina D'Amato, Jeroen van der Laak, Francesco Ciompi
TL;DR
This work rethinks weakly-supervised tumor detection in whole-slide histopathology by reframing the problem as regression to predict tumor percentage rather than binary presence. It develops a MIL-based regression framework with four instance-based variants (MeanPool, ABMIL, CLAM, WeSEG), analyzes robustness to synthetic target noise, and introduces a fifth-root target amplification to improve detection of small lesions. Across five diverse datasets, the approach achieves strong correlations between predicted and true tumor percentages and competitive tumor-detection performance, with ABMIL and CLAM particularly benefiting from amplification and showing robust performance under noise. The study provides interpretable insights via instance-level logits and attention heatmaps, discusses limitations of attention in regression tasks, and offers practical implications for scalable tumor detection without requiring negative examples or precise pixel-level annotations.
Abstract
Accurate tumor detection in digital pathology whole-slide images (WSIs) is crucial for cancer diagnosis and treatment planning. Multiple Instance Learning (MIL) has emerged as a widely used approach for weakly-supervised tumor detection with large-scale data without the need for manual annotations. However, traditional MIL methods often depend on classification tasks that require tumor-free cases as negative examples, which are challenging to obtain in real-world clinical workflows, especially for surgical resection specimens. We address this limitation by reformulating tumor detection as a regression task, estimating tumor percentages from WSIs, a clinically available target across multiple cancer types. In this paper, we provide an analysis of the proposed weakly-supervised regression framework by applying it to multiple organs, specimen types and clinical scenarios. We characterize the robustness of our framework to tumor percentage as a noisy regression target, and introduce a novel concept of amplification technique to improve tumor detection sensitivity when learning from small tumor regions. Finally, we provide interpretable insights into the model's predictions by analyzing visual attention and logit maps. Our code is available at https://github.com/DIAGNijmegen/tumor-percentage-mil-regression.
