Table of Contents
Fetching ...

DermaSynth: Rich Synthetic Image-Text Pairs Using Open Access Dermatology Datasets

Abdurrahim Yilmaz, Furkan Yuceyalcin, Ece Gokyayla, Donghee Choi, Ozan Erdem, Ali Anil Demircali, Rahmetullah Varol, Ufuk Gorkem Kirabali, Gulsum Gencoglan, Joram M. Posma, Burak Temelkuran

TL;DR

DermaSynth addresses the shortage of large image-text datasets for dermatology vision-language models by generating a large synthetic corpus from open-access images using Gemini 2.0 and self-instruction, with prompts enhanced by dataset metadata to curb hallucinations. It aggregates 92,020 image-text pairs from 45,205 images across clinical and dermatoscopic sources with CC-BY-4.0 licenses (DERM12345, BCN20000, PAD-UFES-20, SCIN, HIBA). The authors also fine-tune a preliminary vision-augmented Llama model (DermatoLlama 1.0) on 5,000 samples and release the dataset, model, and training scripts for open research use. They discuss limitations, such as potential gaps in SOTA model knowledge and the need for expert validation, and outline future directions including multi-agent dermatology subfields and RLHF/RAG integration. The work aims to accelerate AI research in dermatology by providing accessible, instruction-following data and a lightweight, scalable model.

Abstract

A major barrier to developing vision large language models (LLMs) in dermatology is the lack of large image--text pairs dataset. We introduce DermaSynth, a dataset comprising of 92,020 synthetic image--text pairs curated from 45,205 images (13,568 clinical and 35,561 dermatoscopic) for dermatology-related clinical tasks. Leveraging state-of-the-art LLMs, using Gemini 2.0, we used clinically related prompts and self-instruct method to generate diverse and rich synthetic texts. Metadata of the datasets were incorporated into the input prompts by targeting to reduce potential hallucinations. The resulting dataset builds upon open access dermatological image repositories (DERM12345, BCN20000, PAD-UFES-20, SCIN, and HIBA) that have permissive CC-BY-4.0 licenses. We also fine-tuned a preliminary Llama-3.2-11B-Vision-Instruct model, DermatoLlama 1.0, on 5,000 samples. We anticipate this dataset to support and accelerate AI research in dermatology. Data and code underlying this work are accessible at https://github.com/abdurrahimyilmaz/DermaSynth.

DermaSynth: Rich Synthetic Image-Text Pairs Using Open Access Dermatology Datasets

TL;DR

DermaSynth addresses the shortage of large image-text datasets for dermatology vision-language models by generating a large synthetic corpus from open-access images using Gemini 2.0 and self-instruction, with prompts enhanced by dataset metadata to curb hallucinations. It aggregates 92,020 image-text pairs from 45,205 images across clinical and dermatoscopic sources with CC-BY-4.0 licenses (DERM12345, BCN20000, PAD-UFES-20, SCIN, HIBA). The authors also fine-tune a preliminary vision-augmented Llama model (DermatoLlama 1.0) on 5,000 samples and release the dataset, model, and training scripts for open research use. They discuss limitations, such as potential gaps in SOTA model knowledge and the need for expert validation, and outline future directions including multi-agent dermatology subfields and RLHF/RAG integration. The work aims to accelerate AI research in dermatology by providing accessible, instruction-following data and a lightweight, scalable model.

Abstract

A major barrier to developing vision large language models (LLMs) in dermatology is the lack of large image--text pairs dataset. We introduce DermaSynth, a dataset comprising of 92,020 synthetic image--text pairs curated from 45,205 images (13,568 clinical and 35,561 dermatoscopic) for dermatology-related clinical tasks. Leveraging state-of-the-art LLMs, using Gemini 2.0, we used clinically related prompts and self-instruct method to generate diverse and rich synthetic texts. Metadata of the datasets were incorporated into the input prompts by targeting to reduce potential hallucinations. The resulting dataset builds upon open access dermatological image repositories (DERM12345, BCN20000, PAD-UFES-20, SCIN, and HIBA) that have permissive CC-BY-4.0 licenses. We also fine-tuned a preliminary Llama-3.2-11B-Vision-Instruct model, DermatoLlama 1.0, on 5,000 samples. We anticipate this dataset to support and accelerate AI research in dermatology. Data and code underlying this work are accessible at https://github.com/abdurrahimyilmaz/DermaSynth.

Paper Structure

This paper contains 1 section, 4 figures, 1 table.

Table of Contents

  1. Setup.

Figures (4)

  • Figure 1: Overview of the synthetic data creation process for DermaSynth. State of the art large language model (Gemini 2.0) were used to generate synthetic and clinically relevant image-text pairsfig1.
  • Figure 2: The most common 20 root verb-noun pairs of question.
  • Figure 3: A figure from DERM12345 dataset with a field specific question and a dataset specific question with their Gemini 2.0 answers.
  • Figure 4: Shows input images with a simple prompt and their answers by original Llama model and DermatoLlama model. Both the standard llama illustration and the variant featuring a dermatoscope were generated using DALL·E 3.