Tumor segmentation on whole slide images: training or prompting?
Huaqian Wu, Clara Brémond-Martin, Kévin Bouaou, Cédric Clouchoux
TL;DR
This study tackles tumor segmentation on gigapixel whole-slide images under limited annotations by comparing patch-based, superpixel-based, semantic segmentation, and a visual prompting approach using SegGPT with a pre-trained Vision Transformer. By training on six WSIs and testing on three, the results show that visual prompting achieves a mean Dice score of around $0.856$, outperforming patch-based methods and matching or surpassing other strategies, with substantially faster inference than patch-based approaches. The work demonstrates the potential of prompting large pre-trained ViTs for medical image analysis when labeled data are scarce, offering a practical, tunable alternative to task-specific model fine-tuning. Overall, visual prompting provides a robust, fast, and data-efficient path for WSI tumor segmentation, while underscoring the importance of prompt quality and organ-specific exemplars for optimal performance.
Abstract
Tumor segmentation stands as a pivotal task in cancer diagnosis. Given the immense dimensions of whole slide images (WSI) in histology, deep learning approaches for WSI classification mainly operate at patch-wise or superpixel-wise level. However, these solutions often struggle to capture global WSI information and cannot directly generate the binary mask. Downsampling the WSI and performing semantic segmentation is another possible approach. While this method offers computational efficiency, it necessitates a large amount of annotated data since resolution reduction may lead to information loss. Visual prompting is a novel paradigm that allows the model to perform new tasks by making subtle modifications to the input space, rather than adapting the model itself. Such approach has demonstrated promising results on many computer vision tasks. In this paper, we show the efficacy of visual prompting in the context of tumor segmentation for three distinct organs. In comparison to classical methods trained for this specific task, our findings reveal that, with appropriate prompt examples, visual prompting can achieve comparable or better performance without extensive fine-tuning.
