An expert-driven data generation pipeline for histological images
Roberto Basla, Loris Giulivi, Luca Magri, Giacomo Boracchi
TL;DR
This work tackles data scarcity in histopathology by introducing an expert-driven pipeline that generates large synthetic datasets for pixel-level nucleus instance segmentation from only a few annotated images. The method decomposes data generation into Blob Generation via homotopy-based contour interpolation, Blob Placement using prior-driven greedy placement, and Image Generation through AdaIN style transfer to synthesize realistic images with correct nucleus annotations. Empirical results show that training HoVerNet on such generated data approaches the performance of models trained on large real datasets in very low-data regimes, with notable gains when real data are scarce. The approach offers a practical path to scalable, domain-informed data augmentation for medical image segmentation and sets the stage for further enhancements in domain shift handling and geometry-conditioned generation.
Abstract
Deep Learning (DL) models have been successfully applied to many applications including biomedical cell segmentation and classification in histological images. These models require large amounts of annotated data which might not always be available, especially in the medical field where annotations are scarce and expensive. To overcome this limitation, we propose a novel pipeline for generating synthetic datasets for cell segmentation. Given only a handful of annotated images, our method generates a large dataset of images which can be used to effectively train DL instance segmentation models. Our solution is designed to generate cells of realistic shapes and placement by allowing experts to incorporate domain knowledge during the generation of the dataset.
