DermaCon-IN: A Multi-concept Annotated Dermatological Image Dataset of Indian Skin Disorders for Clinical AI Research
Shanawaj S Madarkar, Mahajabeen Madarkar, Madhumitha Venkatesh, Deepanshu Bansal, Teli Prakash, Konda Reddy Mopuri, Vinaykumar MV, KVL Sathwika, Adarsh Kasturi, Gandla Dilip Raj, PVN Supranitha, Harsh Udai
TL;DR
DermaCon-IN addresses the lack of regionally representative dermatology datasets by introducing a prospectively collected, South Indian outpatient image bank (5,450 images from 3,002 patients) annotated with 8 main etiologies, 19 subclasses, 245 leaf diseases, 49 body sites, and 47 lesion descriptors, plus Fitzpatrick and Monk skin-tone ratings. The dataset is organized with a clinically grounded, three-tier taxonomy inspired by Rook’s classification, enabling coarse-to-fine modeling and interpretable supervision via Concept Bottleneck Models (CBMs). Baseline benchmarks across standard CNNs and Vision Transformers, plus CBMs, show Swin Transformer backbones perform best for main-class prediction, while CBMs reveal both the promise and current limitations of concept-grounded reasoning, including activation imbalances across concept groups. Beyond classification, DermaCon-IN supports cross-dataset transfer studies (e.g., with PASSION and Fitzpatrick17k) and provides rich interpretability analyses (Grad-CAM, concept activations) to bridge model decisions with clinical reasoning. This resource advances equitable dermatology AI by reflecting regional disease spectrums, skin-tone diversity, and real-world diagnostic workflows, and it lays the groundwork for scalable, interpretable AI in LMIC outpatient care.
Abstract
Artificial intelligence is poised to augment dermatological care by enabling scalable image-based diagnostics. Yet, the development of robust and equitable models remains hindered by datasets that fail to capture the clinical and demographic complexity of real-world practice. This complexity stems from region-specific disease distributions, wide variation in skin tones, and the underrepresentation of outpatient scenarios from non-Western populations. We introduce DermaCon-IN, a prospectively curated dermatology dataset comprising 5,450 clinical images from 3,002 patients across outpatient clinics in South India. Each image is annotated by board-certified dermatologists with 245 distinct diagnoses, structured under a hierarchical, aetiology-based taxonomy adapted from Rook's classification. The dataset captures a wide spectrum of dermatologic conditions and tonal variation commonly seen in Indian outpatient care. We benchmark a range of architectures, including convolutional models (ResNet, DenseNet, EfficientNet), transformer-based models (ViT, MaxViT, Swin), and Concept Bottleneck Models to establish baseline performance and explore how anatomical and concept-level cues may be integrated. These results are intended to guide future efforts toward interpretable and clinically realistic models. DermaCon-IN provides a scalable and representative foundation for advancing dermatology AI.
