Latent Diffusion Models with Image-Derived Annotations for Enhanced AI-Assisted Cancer Diagnosis in Histopathology
Pedro Osorio, Guillermo Jimenez-Perez, Javier Montalt-Tordera, Jens Hooge, Guillem Duran-Ballester, Shivam Singh, Moritz Radbruch, Ute Bach, Sabrina Schroeder, Krystyna Siudak, Julia Vienenkoetter, Bettina Lawrenz, Sadegh Mohammadi
TL;DR
This work tackles data scarcity in histopathology cancer diagnosis by training latent diffusion models (LDMs) with image-derived prompts to synthesize histology patches. It introduces a morphology-enriched prompt-building workflow that leverages DiNO-ViT embeddings and K-means clustering into 33 morphology groups to generate 66 prompts, substantially improving synthetic-data fidelity and coverage (e.g., FID dropping from $178.8$ to $90.2$) and enabling better downstream performance when training with synthetic data alone (AUC improving to $0.805$). Pathologist evaluation shows synthetic patches are largely indistinguishable from real ones, underscoring potential for data sharing and privacy-preserving augmentation. The results demonstrate that synthetic data, especially when guided by image-derived morphology cues, can meaningfully augment small real datasets and reduce data-collection costs for cancer CAD in digital pathology.
Abstract
Artificial Intelligence (AI) based image analysis has an immense potential to support diagnostic histopathology, including cancer diagnostics. However, developing supervised AI methods requires large-scale annotated datasets. A potentially powerful solution is to augment training data with synthetic data. Latent diffusion models, which can generate high-quality, diverse synthetic images, are promising. However, the most common implementations rely on detailed textual descriptions, which are not generally available in this domain. This work proposes a method that constructs structured textual prompts from automatically extracted image features. We experiment with the PCam dataset, composed of tissue patches only loosely annotated as healthy or cancerous. We show that including image-derived features in the prompt, as opposed to only healthy and cancerous labels, improves the Fréchet Inception Distance (FID) from 178.8 to 90.2. We also show that pathologists find it challenging to detect synthetic images, with a median sensitivity/specificity of 0.55/0.55. Finally, we show that synthetic data effectively trains AI models.
