Table of Contents
Fetching ...

An expert-driven data generation pipeline for histological images

Roberto Basla, Loris Giulivi, Luca Magri, Giacomo Boracchi

TL;DR

This work tackles data scarcity in histopathology by introducing an expert-driven pipeline that generates large synthetic datasets for pixel-level nucleus instance segmentation from only a few annotated images. The method decomposes data generation into Blob Generation via homotopy-based contour interpolation, Blob Placement using prior-driven greedy placement, and Image Generation through AdaIN style transfer to synthesize realistic images with correct nucleus annotations. Empirical results show that training HoVerNet on such generated data approaches the performance of models trained on large real datasets in very low-data regimes, with notable gains when real data are scarce. The approach offers a practical path to scalable, domain-informed data augmentation for medical image segmentation and sets the stage for further enhancements in domain shift handling and geometry-conditioned generation.

Abstract

Deep Learning (DL) models have been successfully applied to many applications including biomedical cell segmentation and classification in histological images. These models require large amounts of annotated data which might not always be available, especially in the medical field where annotations are scarce and expensive. To overcome this limitation, we propose a novel pipeline for generating synthetic datasets for cell segmentation. Given only a handful of annotated images, our method generates a large dataset of images which can be used to effectively train DL instance segmentation models. Our solution is designed to generate cells of realistic shapes and placement by allowing experts to incorporate domain knowledge during the generation of the dataset.

An expert-driven data generation pipeline for histological images

TL;DR

This work tackles data scarcity in histopathology by introducing an expert-driven pipeline that generates large synthetic datasets for pixel-level nucleus instance segmentation from only a few annotated images. The method decomposes data generation into Blob Generation via homotopy-based contour interpolation, Blob Placement using prior-driven greedy placement, and Image Generation through AdaIN style transfer to synthesize realistic images with correct nucleus annotations. Empirical results show that training HoVerNet on such generated data approaches the performance of models trained on large real datasets in very low-data regimes, with notable gains when real data are scarce. The approach offers a practical path to scalable, domain-informed data augmentation for medical image segmentation and sets the stage for further enhancements in domain shift handling and geometry-conditioned generation.

Abstract

Deep Learning (DL) models have been successfully applied to many applications including biomedical cell segmentation and classification in histological images. These models require large amounts of annotated data which might not always be available, especially in the medical field where annotations are scarce and expensive. To overcome this limitation, we propose a novel pipeline for generating synthetic datasets for cell segmentation. Given only a handful of annotated images, our method generates a large dataset of images which can be used to effectively train DL instance segmentation models. Our solution is designed to generate cells of realistic shapes and placement by allowing experts to incorporate domain knowledge during the generation of the dataset.
Paper Structure (10 sections, 4 equations, 5 figures, 1 table, 3 algorithms)

This paper contains 10 sections, 4 equations, 5 figures, 1 table, 3 algorithms.

Figures (5)

  • Figure 1: Our generation pipeline. Our first phase is Blob Generation$a$, which creates a set of new blobs ${\{{\widetilde{{B}}}_{{l}} \}_{{L}}}$ by interpolating existing ones. We then perform Blob Placement$b$ to generate the GT ${\widetilde{{M}}}$ following a prior distribution ${\mathcal{P}}$ estimated from the few annotated images. Finally, the Image Generation$c$ phase performs style transfer to transform ${\widetilde{{M}}}$ into the new image ${\widetilde{{I}}}$.
  • Figure 2: Examples of interpolation between ${{B}_{{k}1}}$ (blue) and ${{B}_{{k}2}}$ (green). New blobs (in orange) are selected at equally spaced intervals along the interpolation lines and can be seen as different views of a 3D nucleus.
  • Figure 3: Examples of blob placement. Given a prior map ${\mathcal{P}}$ (a), our greedy placement (c) respects much more closely the distribution with respect to a random weighted placement (b).
  • Figure 4: Example of style transfer.
  • Figure 5: Our results by metric per number of nuclei instances in the real training set.