SkinCaRe: A Multimodal Dermatology Dataset Annotated with Medical Caption and Chain-of-Thought Reasoning
Yuhao Shen, Liyuan Sun, Yan Xu, Wenbin Liu, Shuping Zhang, Shawn Afvari, Zhongyi Han, Jiaoyan Song, Yongzhi Ji, Tao Lu, Xiaonan He, Xin Gao, Juexiao Zhou
TL;DR
SkinCaRe addresses the need for interpretable dermatology AI by uniting SkinCAP, which provides dermatologist-authored observation-first captions for 4,000 images, with SkinCoT, which offers clinician-verified hierarchical chain-of-thought narratives for 3,041 images. The two branches share a common schema, ethics, and provenance, enabling joint supervision for description and reasoning in a single resource totaling 7,041 cases. Technical validation shows high-quality, bilingual captions and robust CoT reasoning (grand mean 4.9433 across six clinician-rated metrics, with 79% perfect scores). Public availability at HuggingFace supports training and evaluating multimodal models that describe skin findings and explain diagnostic paths, advancing trustworthy, clinically grounded dermatology AI.
Abstract
With the widespread application of artificial intelligence (AI), particularly deep learning (DL) and vision large language models (VLLMs), in skin disease diagnosis, the need for interpretability becomes crucial. However, existing dermatology datasets are limited in their inclusion of concept-level meta-labels, and none offer rich medical descriptions in natural language. This deficiency impedes the advancement of LLM-based methods in dermatologic diagnosis. To address this gap and provide a meticulously annotated dermatology dataset with comprehensive natural language descriptions, we introduce \textbf{SkinCaRe}, a comprehensive multimodal resource that unifies \textit{SkinCAP} and \textit{SkinCoT}. \textbf{SkinCAP} comprises 4,000 images sourced from the Fitzpatrick 17k skin disease dataset and the Diverse Dermatology Images dataset, annotated by board-certified dermatologists to provide extensive medical descriptions and captions. In addition, we introduce \textbf{SkinCoT}, a curated dataset pairing 3,041 dermatologic images with clinician-verified, hierarchical chain-of-thought (CoT) diagnoses. Each diagnostic narrative is rigorously evaluated against six quality criteria and iteratively refined until it meets a predefined standard of clinical accuracy and explanatory depth. Together, SkinCAP (captioning) and SkinCoT (reasoning), collectively referred to as SkinCaRe, encompass 7,041 expertly curated dermatologic cases and provide a unified and trustworthy resource for training multimodal models that both describe and explain dermatologic images. SkinCaRe is publicly available at https://huggingface.co/datasets/yuhos16/SkinCaRe.
