Synthesis of Annotated Colorectal Cancer Tissue Images from Gland Layout
Srijay Deshpande, Fayyaz Minhas, Nasir Rajpoot
TL;DR
The paper tackles the scarcity of richly annotated histopathology data by introducing an interactive framework that jointly synthesizes colorectal tissue images and gland masks from gland layouts. It combines a per-gland latent embedding and a mask generator to produce glandular masks, which are wrapped into a tissue mask and passed to a Pix2Pix-like encoder–decoder to generate tissue images, supervised by three discriminators. A key novelty is the latent-diffusion-based synthesis of glandular masks via a VQ-VAE, enabling mask generation without fixed layouts and conditioning on cancer type. Quantitative results show competitive FID scores and strong gland-segmentation validation on synthetic data, demonstrating potential for scalable generation of annotated histology pairs and downstream evaluation tasks.
Abstract
Generating realistic tissue images with annotations is a challenging task that is important in many computational histopathology applications. Synthetically generated images and annotations are valuable for training and evaluating algorithms in this domain. To address this, we propose an interactive framework generating pairs of realistic colorectal cancer histology images with corresponding glandular masks from glandular structure layouts. The framework accurately captures vital features like stroma, goblet cells, and glandular lumen. Users can control gland appearance by adjusting parameters such as the number of glands, their locations, and sizes. The generated images exhibit good Frechet Inception Distance (FID) scores compared to the state-of-the-art image-to-image translation model. Additionally, we demonstrate the utility of our synthetic annotations for evaluating gland segmentation algorithms. Furthermore, we present a methodology for constructing glandular masks using advanced deep generative models, such as latent diffusion models. These masks enable tissue image generation through a residual encoder-decoder network.
