Colorimeter-Supervised Skin Tone Estimation from Dermatoscopic Images for Fairness Auditing
Marin Benčević, Krešimir Romić, Ivana Hartmann Tolić, Irena Galić
TL;DR
This work presents a neural-network pipeline that jointly estimates Fitzpatrick skin type and ITA from dermatoscopic images, using colorimeter ground truth and in-person labels to enable reliable fairness auditing of dermatology AI. It leverages extensive pretraining across synthetic and real datasets and employs an EfficientNet-B0 backbone with an ordinal regression head for FP and a color regression head for ITA, achieving clinically meaningful agreement with ground truth. ITA estimation shows high concordance with colorimeter measurements (ICC3 ≈ 93.9%), substantially outperforming pixel-based baselines, while FP classification reaches moderate agreement near but below human raters. Applying the estimators to ISIC 2020 and MILK10k reveals strong underrepresentation of darker skin tones, highlighting dataset biases that could impact model fairness. The authors release code and pretrained models to facilitate rapid skin-tone annotation and bias auditing in dermatoscopic datasets.
Abstract
Neural-network-based diagnosis from dermatoscopic images is increasingly used for clinical decision support, yet studies report performance disparities across skin tones. Fairness auditing of these models is limited by the lack of reliable skin-tone annotations in public dermatoscopy datasets. We address this gap with neural networks that predict Fitzpatrick skin type via ordinal regression and the Individual Typology Angle (ITA) via color regression, using in-person Fitzpatrick labels and colorimeter measurements as targets. We further leverage extensive pretraining on synthetic and real dermatoscopic and clinical images. The Fitzpatrick model achieves agreement comparable to human crowdsourced annotations, and ITA predictions show high concordance with colorimeter-derived ITA, substantially outperforming pixel-averaging approaches. Applying these estimators to ISIC 2020 and MILK10k, we find that fewer than 1% of subjects belong to Fitzpatrick types V and VI. We release code and pretrained models as an open-source tool for rapid skin-tone annotation and bias auditing. This is, to our knowledge, the first dermatoscopic skin-tone estimation neural network validated against colorimeter measurements, and it supports growing evidence of clinically relevant performance gaps across skin-tone groups.
