Leveraging image captions for selective whole slide image annotation

Jingna Qiu; Marc Aubreville; Frauke Wilm; Mathias Öttl; Jonas Utz; Maja Schlereth; Katharina Breininger

Leveraging image captions for selective whole slide image annotation

Jingna Qiu, Marc Aubreville, Frauke Wilm, Mathias Öttl, Jonas Utz, Maja Schlereth, Katharina Breininger

TL;DR

Annotating whole-slide images is highly labor-intensive, limiting scalable training for tasks like semantic segmentation and mitotic figure detection. The authors propose prototype sampling, which builds task-specific class prototypes from histopathology image-caption databases and uses patch embeddings to create a similarity map that guides selective region annotation without exhaustively labeling all regions. Across CAMELYON16 and MITOS_WSI_CMC, prototype sampling outperforms random and diversity baselines, achieving near full performance with less than 20% of tissue annotated and demonstrating robustness to the choice of caption database (ARCH vs OpenPath). This approach reduces annotation cost, preserves minority-class information, and offers practical applicability for efficient WSI-centric histopathology pipelines.

Abstract

Acquiring annotations for whole slide images (WSIs)-based deep learning tasks, such as creating tissue segmentation masks or detecting mitotic figures, is a laborious process due to the extensive image size and the significant manual work involved in the annotation. This paper focuses on identifying and annotating specific image regions that optimize model training, given a limited annotation budget. While random sampling helps capture data variance by collecting annotation regions throughout the WSIs, insufficient data curation may result in an inadequate representation of minority classes. Recent studies proposed diversity sampling to select a set of regions that maximally represent unique characteristics of the WSIs. This is done by pretraining on unlabeled data through self-supervised learning and then clustering all regions in the latent space. However, establishing the optimal number of clusters can be difficult and not all clusters are task-relevant. This paper presents prototype sampling, a new method for annotation region selection. It discovers regions exhibiting typical characteristics of each task-specific class. The process entails recognizing class prototypes from extensive histopathology image-caption databases and detecting unlabeled image regions that resemble these prototypes. Our results show that prototype sampling is more effective than random and diversity sampling in identifying annotation regions with valuable training information, resulting in improved model performance in semantic segmentation and mitotic figure detection tasks. Code is available at https://github.com/DeepMicroscopy/Prototype-sampling.

Leveraging image captions for selective whole slide image annotation

TL;DR

Abstract

Paper Structure (13 sections, 4 figures, 2 tables)

This paper contains 13 sections, 4 figures, 2 tables.

Introduction
Method
General Setup
Prototype Sampling
Experiments
Image-caption Pair Databases
Semantic Segmentation
Mitotic Figure Detection
Results
Discussion and Conclusion
Acknowledgments.
Disclosure of Interests.
Supplementary Materials

Figures (4)

Figure 1: Workflow of prototype sampling.
Figure 2: (a-b) Results on CAMELYON16 dataset: mIoU (Tumor) and annotated tumor area (%) as functions of annotated tissue area (%). (c) Results on MITOS_WSI_CMC dataset: F1 and the ratio of annotated mitotic figures as functions of annotated tissue area (%). Prototype (adapt) can have different amounts of annotated area as the size of each selected region is dynamically determined. All other methods select regions of size $l\times l$. Random sampling can lead to a slightly smaller annotated area when no more non-overlapping region containing at least $10\%$ tissue is found. All results show median values from five repetitions.
Figure 3: (a-b) Two example prototype images of mitotic figure. (c-d) An example wsi and its similarity map. Ground truth mitotic figures marked green in (c) and red in (d). The blue boxes in (c) and (d) indicate regions selected with random and prototype sampling, respectively (hyperparameter: 3_M).
Figure S1: Comparison of the standard and adaptive region selection method. (a-b) Results on CAMELYON16 dataset: mIoU (Tumor) and annotated tumor area (%) as functions of annotated tissue area (%) for prototype sampling across various hyperparameter settings. (c) Results on MITOS_WSI_CMC dataset: F1 and the ratio of annotated mitotic figures as functions of annotated tissue area (%) for prototype sampling across various hyperparameter settings. All results show the median value from five repetitions.

Leveraging image captions for selective whole slide image annotation

TL;DR

Abstract

Leveraging image captions for selective whole slide image annotation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)