Improving Generalization Capability of Deep Learning-Based Nuclei Instance Segmentation by Non-deterministic Train Time and Deterministic Test Time Stain Normalization
Amirreza Mahbod, Georg Dorffner, Isabella Ellinger, Ramona Woitek, Sepideh Hatamikia
TL;DR
The paper addresses the challenge of DL-based nuclei instance segmentation generalizing to unseen histopathology datasets due to domain shift. It introduces a hybrid approach that combines non-deterministic train-time stain normalization (via Macenko with multiple reference images), deterministic test-time stain normalization, morphological test-time augmentation, and model ensembling on top of a strong baseline (DDU-Net). Across seven external datasets, the method yields consistent improvements in Dice, AJI, and PQ—up to 4.9%, 5.4%, and 5.9% respectively—while revealing trade-offs in inference time when using test-time stain normalization. The work demonstrates a practical pathway to robust nuclei segmentation across diverse tissues, with potential applicability to other histopathology tasks and architectures, albeit with noted computational overhead and reliance on stain normalization procedures.
Abstract
With the advent of digital pathology and microscopic systems that can scan and save whole slide histological images automatically, there is a growing trend to use computerized methods to analyze acquired images. Among different histopathological image analysis tasks, nuclei instance segmentation plays a fundamental role in a wide range of clinical and research applications. While many semi- and fully-automatic computerized methods have been proposed for nuclei instance segmentation, deep learning (DL)-based approaches have been shown to deliver the best performances. However, the performance of such approaches usually degrades when tested on unseen datasets. In this work, we propose a novel method to improve the generalization capability of a DL-based automatic segmentation approach. Besides utilizing one of the state-of-the-art DL-based models as a baseline, our method incorporates non-deterministic train time and deterministic test time stain normalization, and ensembling to boost the segmentation performance. We trained the model with one single training set and evaluated its segmentation performance on seven test datasets. Our results show that the proposed method provides up to 4.9%, 5.4%, and 5.9% better average performance in segmenting nuclei based on Dice score, aggregated Jaccard index, and panoptic quality score, respectively, compared to the baseline segmentation model.
