Co-synthesis of Histopathology Nuclei Image-Label Pairs using a Context-Conditioned Joint Diffusion Model
Seonghui Min, Hyun-Jic Oh, Won-Ki Jeong
TL;DR
This paper addresses data scarcity in multi-class histopathology nuclei analysis by introducing a context-conditioned joint diffusion model that co-synthesizes histopathology images, distance maps, and semantic labels as a unified data unit $u=(i,d,l^s)$. Conditioning on nucleus centroid layout $pc$ and structure-aware text prompts $tc$, the model jointly denoises $i$ and $d$ (Gaussian) and $l^s$ (categorical), optimizing $L_{total}=\lambda_i L_i+\lambda_d L_d+\lambda_{l^s} L_{l^s}$ and enabling sampling with a guided noise term $\tilde{\epsilon}_\theta$. Distance maps $d$ and $pc$ drive instance separation via a marker-controlled watershed to produce accurate instance labels $l^i$, facilitating high-quality nuclei segmentation analytics. Evaluations across Lizard, PanNuke, and EndoNuke demonstrate superior image-label alignment, realistic visuals validated by pathologists, and enhanced downstream segmentation and classification performance, illustrating robust cross-domain applicability for histopathology data augmentation.
Abstract
In multi-class histopathology nuclei analysis tasks, the lack of training data becomes a main bottleneck for the performance of learning-based methods. To tackle this challenge, previous methods have utilized generative models to increase data by generating synthetic samples. However, existing methods often overlook the importance of considering the context of biological tissues (e.g., shape, spatial layout, and tissue type) in the synthetic data. Moreover, while generative models have shown superior performance in synthesizing realistic histopathology images, none of the existing methods are capable of producing image-label pairs at the same time. In this paper, we introduce a novel framework for co-synthesizing histopathology nuclei images and paired semantic labels using a context-conditioned joint diffusion model. We propose conditioning of a diffusion model using nucleus centroid layouts with structure-related text prompts to incorporate spatial and structural context information into the generation targets. Moreover, we enhance the granularity of our synthesized semantic labels by generating instance-wise nuclei labels using distance maps synthesized concurrently in conjunction with the images and semantic labels. We demonstrate the effectiveness of our framework in generating high-quality samples on multi-institutional, multi-organ, and multi-modality datasets. Our synthetic data consistently outperforms existing augmentation methods in the downstream tasks of nuclei segmentation and classification.
