Semantic and Visual Crop-Guided Diffusion Models for Heterogeneous Tissue Synthesis in Histopathology

Saghir Alfasly; Wataru Uegami; MD Enamul Hoq; Ghazal Alabtah; H. R. Tizhoosh

Semantic and Visual Crop-Guided Diffusion Models for Heterogeneous Tissue Synthesis in Histopathology

Saghir Alfasly, Wataru Uegami, MD Enamul Hoq, Ghazal Alabtah, H. R. Tizhoosh

TL;DR

This work tackles data scarcity and tissue heterogeneity in histopathology by introducing HeteroTissue-Diffuse, a dual-conditioning latent diffusion model that couples semantic maps with tissue-specific visual crops to preserve morphological fidelity across heterogeneous regions. A self-supervised extension enables training on unannotated WSIs (TCGA) by clustering 100 tissue phenotypes and generating pseudo-semantic maps, enabling scalable synthesis across 11,765 WSIs. The approach yields strong quantitative gains (e.g., up to a $6\times$ reduction in Fréchet Distance on Camelyon16) and near real-data segmentation performance (IoU within 1-2% of real data on Camelyon16 and Panda) while achieving clinician-validated realism in blinded expert assessments. By delivering diverse, annotated synthetic histopathology data with privacy protections, HTD promises to democratize access to high-quality training data and accelerate robust AI-driven pathology across cancer types.

Abstract

Synthetic data generation in histopathology faces unique challenges: preserving tissue heterogeneity, capturing subtle morphological features, and scaling to unannotated datasets. We present a latent diffusion model that generates realistic heterogeneous histopathology images through a novel dual-conditioning approach combining semantic segmentation maps with tissue-specific visual crops. Unlike existing methods that rely on text prompts or abstract visual embeddings, our approach preserves critical morphological details by directly incorporating raw tissue crops from corresponding semantic regions. For annotated datasets (i.e., Camelyon16, Panda), we extract patches ensuring 20-80% tissue heterogeneity. For unannotated data (i.e., TCGA), we introduce a self-supervised extension that clusters whole-slide images into 100 tissue types using foundation model embeddings, automatically generating pseudo-semantic maps for training. Our method synthesizes high-fidelity images with precise region-wise annotations, achieving superior performance on downstream segmentation tasks. When evaluated on annotated datasets, models trained on our synthetic data show competitive performance to those trained on real data, demonstrating the utility of controlled heterogeneous tissue generation. In quantitative evaluation, prompt-guided synthesis reduces Frechet Distance by up to 6X on Camelyon16 (from 430.1 to 72.0) and yields 2-3x lower FD across Panda and TCGA. Downstream DeepLabv3+ models trained solely on synthetic data attain test IoU of 0.71 and 0.95 on Camelyon16 and Panda, within 1-2% of real-data baselines (0.72 and 0.96). By scaling to 11,765 TCGA whole-slide images without manual annotations, our framework offers a practical solution for an urgent need for generating diverse, annotated histopathology data, addressing a critical bottleneck in computational pathology.

Semantic and Visual Crop-Guided Diffusion Models for Heterogeneous Tissue Synthesis in Histopathology

TL;DR

Abstract

Semantic and Visual Crop-Guided Diffusion Models for Heterogeneous Tissue Synthesis in Histopathology

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (26)