Leveraging Weak Supervision for Cell Localization in Digital Pathology Using Multitask Learning and Consistency Loss
Berke Levent Cesur, Ayse Humeyra Dur Karasayar, Pinar Bulutay, Nilgun Kapucuoglu, Cisel Aydin Mericoz, Handan Eren, Omer Faruk Dilbaz, Javidan Osmanli, Burhan Soner Yetkili, Ibrahim Kulac, Can Fahrettin Koyuncu, Cigdem Gunduz-Demir
TL;DR
The study tackles the high cost of obtaining precise cell boundaries for localization in digital pathology by introducing a mixed-supervision framework that uses eyeballing-derived cell counts as auxiliary supervision. A multitask network with a shared encoder jointly learns cell localization and counting, reinforced by a consistency loss that aligns the two tasks via the predicted count and the number of segmented cell objects. The approach achieves superior localization and counting performance compared with single-task baselines and two state-of-the-art methods, particularly when strong point annotations are scarce, while substantially reducing annotation effort. This work demonstrates the practical potential of incorporating eyeballing-derived supervision to scale cell-level analysis in histopathology pipelines.
Abstract
Cell detection and segmentation are integral parts of automated systems in digital pathology. Encoder-decoder networks have emerged as a promising solution for these tasks. However, training of these networks has typically required full boundary annotations of cells, which are labor-intensive and difficult to obtain on a large scale. However, in many applications, such as cell counting, weaker forms of annotations--such as point annotations or approximate cell counts--can provide sufficient supervision for training. This study proposes a new mixed-supervision approach for training multitask networks in digital pathology by incorporating cell counts derived from the eyeballing process--a quick visual estimation method commonly used by pathologists. This study has two main contributions: (1) It proposes a mixed-supervision strategy for digital pathology that utilizes cell counts obtained by eyeballing as an auxiliary supervisory signal to train a multitask network for the first time. (2) This multitask network is designed to concurrently learn the tasks of cell counting and cell localization, and this study introduces a consistency loss that regularizes training by penalizing inconsistencies between the predictions of these two tasks. Our experiments on two datasets of hematoxylin-eosin stained tissue images demonstrate that the proposed approach effectively utilizes the weakest form of annotation, improving performance when stronger annotations are limited. These results highlight the potential of integrating eyeballing-derived ground truths into the network training, reducing the need for resource-intensive annotations.
