What Appears Appealing May Not be Significant! -- A Clinical Perspective of Diffusion Models
Vanshali Sharma
TL;DR
This work tackles the challenge of evaluating the clinical significance of diffusion-model-generated colonoscopy polyp images, specifically distinguishing adenomatous (AD) from non-adenomatous (Non-AD) polyps. It introduces a two-stage stable-diffusion training approach with a denoising U-Net to capture polyp-patterns and pathology labels, followed by assessment of clinical relevance using $t$-SNE, Kernel Inception Distance (KID), and augmentation-based binary classification. Across datasets (SUN, CVC-ClinicHD-Segment, CVC-ClinicHD-Classification), the study finds that visually appealing generations in early iterations do not reliably indicate clinical relevance, and even the best quantitative metrics may not surpass real data for augmentation. The findings underscore the need for dedicated clinical evaluation frameworks when applying medical image generation and point to future directions for aligning synthetic outputs with tangible clinical utility.
Abstract
Various trending image generative techniques, such as diffusion models, have enabled visually appealing outcomes with just text-based descriptions. Unlike general images, where assessing the quality and alignment with text descriptions is trivial, establishing such a relation in a clinical setting proves challenging. This work investigates various strategies to evaluate the clinical significance of synthetic polyp images of different pathologies. We further explore if a relation could be established between qualitative results and their clinical relevance.
