Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback
Janet Wang, Yunbei Zhang, Zhengming Ding, Jihun Hamm
TL;DR
This work addresses data scarcity in dermatology by enabling medically accurate skin-disease image generation through MAGIC, a semi-automated framework that leverages AI-expert collaboration. MAGIC uses expert-crafted clinical checklists evaluated by Multimodal LLMs to guide diffusion-model fine-tuning via two routes, RFT and DPO, and incorporates an Image-to-Image module to accelerate sampling while preserving anatomical context. Empirical results show substantial improvements in clinical fidelity (higher dermatologist-aligned scores, lower FID) and downstream diagnostic accuracy, including a +9.02 percentage-point gain for ResNet18 and a +5.12-point gain for DINOv2 on a 20-condition task, with pronounced benefits in few-shot scenarios. The approach reduces expert labeling workload, remains model-agnostic, and highlights a scalable path for applying foundation-model feedback to specialized medical imaging tasks.
Abstract
Paucity of medical data severely limits the generalizability of diagnostic ML models, as the full spectrum of disease variability can not be represented by a small clinical dataset. To address this, diffusion models (DMs) have been considered as a promising avenue for synthetic image generation and augmentation. However, they frequently produce medically inaccurate images, deteriorating the model performance. Expert domain knowledge is critical for synthesizing images that correctly encode clinical information, especially when data is scarce and quality outweighs quantity. Existing approaches for incorporating human feedback, such as reinforcement learning (RL) and Direct Preference Optimization (DPO), rely on robust reward functions or demand labor-intensive expert evaluations. Recent progress in Multimodal Large Language Models (MLLMs) reveals their strong visual reasoning capabilities, making them adept candidates as evaluators. In this work, we propose a novel framework, coined MAGIC (Medically Accurate Generation of Images through AI-Expert Collaboration), that synthesizes clinically accurate skin disease images for data augmentation. Our method creatively translates expert-defined criteria into actionable feedback for image synthesis of DMs, significantly improving clinical accuracy while reducing the direct human workload. Experiments demonstrate that our method greatly improves the clinical quality of synthesized skin disease images, with outputs aligning with dermatologist assessments. Additionally, augmenting training data with these synthesized images improves diagnostic accuracy by +9.02% on a challenging 20-condition skin disease classification task, and by +13.89% in the few-shot setting.
