Table of Contents
Fetching ...

Semantic and Visual Crop-Guided Diffusion Models for Heterogeneous Tissue Synthesis in Histopathology

Saghir Alfasly, Wataru Uegami, MD Enamul Hoq, Ghazal Alabtah, H. R. Tizhoosh

TL;DR

This work tackles data scarcity and tissue heterogeneity in histopathology by introducing HeteroTissue-Diffuse, a dual-conditioning latent diffusion model that couples semantic maps with tissue-specific visual crops to preserve morphological fidelity across heterogeneous regions. A self-supervised extension enables training on unannotated WSIs (TCGA) by clustering 100 tissue phenotypes and generating pseudo-semantic maps, enabling scalable synthesis across 11,765 WSIs. The approach yields strong quantitative gains (e.g., up to a $6\times$ reduction in Fréchet Distance on Camelyon16) and near real-data segmentation performance (IoU within 1-2% of real data on Camelyon16 and Panda) while achieving clinician-validated realism in blinded expert assessments. By delivering diverse, annotated synthetic histopathology data with privacy protections, HTD promises to democratize access to high-quality training data and accelerate robust AI-driven pathology across cancer types.

Abstract

Synthetic data generation in histopathology faces unique challenges: preserving tissue heterogeneity, capturing subtle morphological features, and scaling to unannotated datasets. We present a latent diffusion model that generates realistic heterogeneous histopathology images through a novel dual-conditioning approach combining semantic segmentation maps with tissue-specific visual crops. Unlike existing methods that rely on text prompts or abstract visual embeddings, our approach preserves critical morphological details by directly incorporating raw tissue crops from corresponding semantic regions. For annotated datasets (i.e., Camelyon16, Panda), we extract patches ensuring 20-80% tissue heterogeneity. For unannotated data (i.e., TCGA), we introduce a self-supervised extension that clusters whole-slide images into 100 tissue types using foundation model embeddings, automatically generating pseudo-semantic maps for training. Our method synthesizes high-fidelity images with precise region-wise annotations, achieving superior performance on downstream segmentation tasks. When evaluated on annotated datasets, models trained on our synthetic data show competitive performance to those trained on real data, demonstrating the utility of controlled heterogeneous tissue generation. In quantitative evaluation, prompt-guided synthesis reduces Frechet Distance by up to 6X on Camelyon16 (from 430.1 to 72.0) and yields 2-3x lower FD across Panda and TCGA. Downstream DeepLabv3+ models trained solely on synthetic data attain test IoU of 0.71 and 0.95 on Camelyon16 and Panda, within 1-2% of real-data baselines (0.72 and 0.96). By scaling to 11,765 TCGA whole-slide images without manual annotations, our framework offers a practical solution for an urgent need for generating diverse, annotated histopathology data, addressing a critical bottleneck in computational pathology.

Semantic and Visual Crop-Guided Diffusion Models for Heterogeneous Tissue Synthesis in Histopathology

TL;DR

This work tackles data scarcity and tissue heterogeneity in histopathology by introducing HeteroTissue-Diffuse, a dual-conditioning latent diffusion model that couples semantic maps with tissue-specific visual crops to preserve morphological fidelity across heterogeneous regions. A self-supervised extension enables training on unannotated WSIs (TCGA) by clustering 100 tissue phenotypes and generating pseudo-semantic maps, enabling scalable synthesis across 11,765 WSIs. The approach yields strong quantitative gains (e.g., up to a reduction in Fréchet Distance on Camelyon16) and near real-data segmentation performance (IoU within 1-2% of real data on Camelyon16 and Panda) while achieving clinician-validated realism in blinded expert assessments. By delivering diverse, annotated synthetic histopathology data with privacy protections, HTD promises to democratize access to high-quality training data and accelerate robust AI-driven pathology across cancer types.

Abstract

Synthetic data generation in histopathology faces unique challenges: preserving tissue heterogeneity, capturing subtle morphological features, and scaling to unannotated datasets. We present a latent diffusion model that generates realistic heterogeneous histopathology images through a novel dual-conditioning approach combining semantic segmentation maps with tissue-specific visual crops. Unlike existing methods that rely on text prompts or abstract visual embeddings, our approach preserves critical morphological details by directly incorporating raw tissue crops from corresponding semantic regions. For annotated datasets (i.e., Camelyon16, Panda), we extract patches ensuring 20-80% tissue heterogeneity. For unannotated data (i.e., TCGA), we introduce a self-supervised extension that clusters whole-slide images into 100 tissue types using foundation model embeddings, automatically generating pseudo-semantic maps for training. Our method synthesizes high-fidelity images with precise region-wise annotations, achieving superior performance on downstream segmentation tasks. When evaluated on annotated datasets, models trained on our synthetic data show competitive performance to those trained on real data, demonstrating the utility of controlled heterogeneous tissue generation. In quantitative evaluation, prompt-guided synthesis reduces Frechet Distance by up to 6X on Camelyon16 (from 430.1 to 72.0) and yields 2-3x lower FD across Panda and TCGA. Downstream DeepLabv3+ models trained solely on synthetic data attain test IoU of 0.71 and 0.95 on Camelyon16 and Panda, within 1-2% of real-data baselines (0.72 and 0.96). By scaling to 11,765 TCGA whole-slide images without manual annotations, our framework offers a practical solution for an urgent need for generating diverse, annotated histopathology data, addressing a critical bottleneck in computational pathology.

Paper Structure

This paper contains 36 sections, 9 equations, 26 figures, 7 tables, 3 algorithms.

Figures (26)

  • Figure 1: Comparison of reference and synthetic Camelyon16 patches using three conditioning schemes. From left to right: original histopathology patch; binary tumor–normal mask; generated image with a semantic segmentation map conditioning only; synthetic image by combining conditions of semantic maps and abstract embeddings; synthetic output of our model conditioned on the semantic map and tissue-specific visual crop prompts. The crop-guided generation recovers fine morphological details and staining heterogeneity more faithfully than embedding-based conditioning.
  • Figure 1: Heterogeneous Patch Sampling for Annotated Data
  • Figure 2: Schematic overview of the HeteroTissue-Diffuse framework for heterogeneous tissue synthesis in histopathology. (a) Unsupervised Tissue Clustering: For unannotated datasets (TCGA), approximately $11,765$ whole-slide images (WSIs) are processed to extract 634,435,134 million patches. A histopathology foundation model extracts deep features, which are used to cluster patches into 100 distinct tissue types, creating pseudo-labeled data. (b) Online Region/Mask Sampling: The pseudo-labeled patches are used to generate WSI-level segmentation maps and regional masks for conditioning the diffusion model. (c) Diffusion Model: Our dual-conditioning approach combines semantic maps with visual tissue prompts (crops) to guide the latent diffusion process. The model encodes the input image to the latent space, applies forward diffusion to create a noisy latent, then reverses this process with UNet denoising conditioned on both semantic maps and visual tissue exemplars. For annotated datasets (Camelyon16 bejnordi2017diagnostic and Panda bulten2022artificial), only component (c) is used with their existing semantic maps.
  • Figure 2: Test IoU performance of DeepLabv3+ trained on real and synthetic data variants across Camelyon16 and Panda datasets.
  • Figure 3: Validation and test IoU comparison of DeepLabv3+ models across different training dataset types on (a) Camelyon16 and (b) Panda. Training on synthetic data generated with a visual prompt achieves performance comparable to real data training, with NoPrompt showing lower performance.
  • ...and 21 more figures