Table of Contents
Fetching ...

Leveraging Weak Supervision for Cell Localization in Digital Pathology Using Multitask Learning and Consistency Loss

Berke Levent Cesur, Ayse Humeyra Dur Karasayar, Pinar Bulutay, Nilgun Kapucuoglu, Cisel Aydin Mericoz, Handan Eren, Omer Faruk Dilbaz, Javidan Osmanli, Burhan Soner Yetkili, Ibrahim Kulac, Can Fahrettin Koyuncu, Cigdem Gunduz-Demir

TL;DR

The study tackles the high cost of obtaining precise cell boundaries for localization in digital pathology by introducing a mixed-supervision framework that uses eyeballing-derived cell counts as auxiliary supervision. A multitask network with a shared encoder jointly learns cell localization and counting, reinforced by a consistency loss that aligns the two tasks via the predicted count and the number of segmented cell objects. The approach achieves superior localization and counting performance compared with single-task baselines and two state-of-the-art methods, particularly when strong point annotations are scarce, while substantially reducing annotation effort. This work demonstrates the practical potential of incorporating eyeballing-derived supervision to scale cell-level analysis in histopathology pipelines.

Abstract

Cell detection and segmentation are integral parts of automated systems in digital pathology. Encoder-decoder networks have emerged as a promising solution for these tasks. However, training of these networks has typically required full boundary annotations of cells, which are labor-intensive and difficult to obtain on a large scale. However, in many applications, such as cell counting, weaker forms of annotations--such as point annotations or approximate cell counts--can provide sufficient supervision for training. This study proposes a new mixed-supervision approach for training multitask networks in digital pathology by incorporating cell counts derived from the eyeballing process--a quick visual estimation method commonly used by pathologists. This study has two main contributions: (1) It proposes a mixed-supervision strategy for digital pathology that utilizes cell counts obtained by eyeballing as an auxiliary supervisory signal to train a multitask network for the first time. (2) This multitask network is designed to concurrently learn the tasks of cell counting and cell localization, and this study introduces a consistency loss that regularizes training by penalizing inconsistencies between the predictions of these two tasks. Our experiments on two datasets of hematoxylin-eosin stained tissue images demonstrate that the proposed approach effectively utilizes the weakest form of annotation, improving performance when stronger annotations are limited. These results highlight the potential of integrating eyeballing-derived ground truths into the network training, reducing the need for resource-intensive annotations.

Leveraging Weak Supervision for Cell Localization in Digital Pathology Using Multitask Learning and Consistency Loss

TL;DR

The study tackles the high cost of obtaining precise cell boundaries for localization in digital pathology by introducing a mixed-supervision framework that uses eyeballing-derived cell counts as auxiliary supervision. A multitask network with a shared encoder jointly learns cell localization and counting, reinforced by a consistency loss that aligns the two tasks via the predicted count and the number of segmented cell objects. The approach achieves superior localization and counting performance compared with single-task baselines and two state-of-the-art methods, particularly when strong point annotations are scarce, while substantially reducing annotation effort. This work demonstrates the practical potential of incorporating eyeballing-derived supervision to scale cell-level analysis in histopathology pipelines.

Abstract

Cell detection and segmentation are integral parts of automated systems in digital pathology. Encoder-decoder networks have emerged as a promising solution for these tasks. However, training of these networks has typically required full boundary annotations of cells, which are labor-intensive and difficult to obtain on a large scale. However, in many applications, such as cell counting, weaker forms of annotations--such as point annotations or approximate cell counts--can provide sufficient supervision for training. This study proposes a new mixed-supervision approach for training multitask networks in digital pathology by incorporating cell counts derived from the eyeballing process--a quick visual estimation method commonly used by pathologists. This study has two main contributions: (1) It proposes a mixed-supervision strategy for digital pathology that utilizes cell counts obtained by eyeballing as an auxiliary supervisory signal to train a multitask network for the first time. (2) This multitask network is designed to concurrently learn the tasks of cell counting and cell localization, and this study introduces a consistency loss that regularizes training by penalizing inconsistencies between the predictions of these two tasks. Our experiments on two datasets of hematoxylin-eosin stained tissue images demonstrate that the proposed approach effectively utilizes the weakest form of annotation, improving performance when stronger annotations are limited. These results highlight the potential of integrating eyeballing-derived ground truths into the network training, reducing the need for resource-intensive annotations.

Paper Structure

This paper contains 13 sections, 3 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of the proposed mixed-supervised strategy to train the multitask network with different levels of supervision. In particular, for a training image $I_i \in D_1$, the ground truth mask $S_i$ generated from point annotations and the cell count $C_i$ obtained by counting the annotated points are available, and all loss components are calculated between the ground truths and the predicted values $\hat{S_i}$ and $\hat{C_i}$ (red boxes and arrows). For a training image $I_k \in D_2$, only the cell count $C_k$ obtained by eyeballing is available, and only cell count related loss components are calculated (green boxes and arrows). Note that for $I_k$, one can also calculate the consistency term ${\cal L}_{SC}(k)$ since this calculation uses the ground truth count $C_k$, its predicted value $\widehat{C_k}$, and the number $\widehat{C_{s_k}}$ of connected components in the predicted mask $\widehat{S_k}$, but not the ground truth mask $S_k$, which is not available for $I_k$. This calculation is depicted in the purple box on the right. After calculating the joint loss, relevant network weights are updated through backpropagation.
  • Figure 2: Architecture of the proposed multitask network. The number of feature maps used in each block is indicated on its top.
  • Figure 3: An example patch, with $384\times 384$ pixel resolution, cropped from an original image, with $1536\times 1536$ resolution, of the serous carcinoma dataset, and its point annotations. As shown here, obtaining point or boundary annotations is challenging when there are many cells to annotate. It is worth noting that this patch is 1/16th of the original image.
  • Figure 4: Visual results on example test set images. (a) Patches cropped from the original images. The first four patches were selected from the serous carcinoma dataset, and the last four from the MoNuSeg dataset. The patches were cropped for better illustration. (b) Point annotations in the ground truths. (c) MixedSupervision when $p=100$, (d) MixedSupervision when $p=25$, (e) ConCORDe-Net hagos2019concordenet when $p=100$, (f) ConCORDe-Net when $p=25$, and (g) SSRNet deng2023ssrnet.