Colorimeter-Supervised Skin Tone Estimation from Dermatoscopic Images for Fairness Auditing

Marin Benčević; Krešimir Romić; Ivana Hartmann Tolić; Irena Galić

Colorimeter-Supervised Skin Tone Estimation from Dermatoscopic Images for Fairness Auditing

Marin Benčević, Krešimir Romić, Ivana Hartmann Tolić, Irena Galić

TL;DR

This work presents a neural-network pipeline that jointly estimates Fitzpatrick skin type and ITA from dermatoscopic images, using colorimeter ground truth and in-person labels to enable reliable fairness auditing of dermatology AI. It leverages extensive pretraining across synthetic and real datasets and employs an EfficientNet-B0 backbone with an ordinal regression head for FP and a color regression head for ITA, achieving clinically meaningful agreement with ground truth. ITA estimation shows high concordance with colorimeter measurements (ICC3 ≈ 93.9%), substantially outperforming pixel-based baselines, while FP classification reaches moderate agreement near but below human raters. Applying the estimators to ISIC 2020 and MILK10k reveals strong underrepresentation of darker skin tones, highlighting dataset biases that could impact model fairness. The authors release code and pretrained models to facilitate rapid skin-tone annotation and bias auditing in dermatoscopic datasets.

Abstract

Neural-network-based diagnosis from dermatoscopic images is increasingly used for clinical decision support, yet studies report performance disparities across skin tones. Fairness auditing of these models is limited by the lack of reliable skin-tone annotations in public dermatoscopy datasets. We address this gap with neural networks that predict Fitzpatrick skin type via ordinal regression and the Individual Typology Angle (ITA) via color regression, using in-person Fitzpatrick labels and colorimeter measurements as targets. We further leverage extensive pretraining on synthetic and real dermatoscopic and clinical images. The Fitzpatrick model achieves agreement comparable to human crowdsourced annotations, and ITA predictions show high concordance with colorimeter-derived ITA, substantially outperforming pixel-averaging approaches. Applying these estimators to ISIC 2020 and MILK10k, we find that fewer than 1% of subjects belong to Fitzpatrick types V and VI. We release code and pretrained models as an open-source tool for rapid skin-tone annotation and bias auditing. This is, to our knowledge, the first dermatoscopic skin-tone estimation neural network validated against colorimeter measurements, and it supports growing evidence of clinically relevant performance gaps across skin-tone groups.

Colorimeter-Supervised Skin Tone Estimation from Dermatoscopic Images for Fairness Auditing

TL;DR

Abstract

Paper Structure (16 sections, 2 equations, 4 figures, 7 tables)

This paper contains 16 sections, 2 equations, 4 figures, 7 tables.

Introduction
Related work
Methods
Data Description
Data Preprocessing and Pretraining Data
Neural Network Backbone and Pretraining
Fitzpatrick type classification
ITA estimation
Results
Fitzpatrick Type Classification Results
ITA Estimation Results
Discussion
Skin Tone Distribution of Commonly Used Dermatoscopic Benchmarks
Limitations
Comparison of Ordinal Regression to Classification
...and 1 more sections

Figures (4)

Figure 1: Examples of ITA estimation. Each row shows random examples from one Fitzpatrick type from the MSKCC dataset weirEvaluatingSkinTone2025. Each entry shows the dermoscopy image (left), the ground-truth skin-color swatch (middle), and the predicted swatch (right), converted from CIELAB to sRGB for visualization.
Figure 2: Bland--Altman plot for ITA predictions. The solid line shows the mean bias (Predicted -- Colorimeter ITA), and the dashed lines show the 95% limits of agreement (LoA).
Figure 3: Distribution of ITA values across datasets. For ISIC 2020 rotembergPatientcentricDatasetImages2021a and MILK10k tschandlMILK10kHierarchicalMultimodal2026, ITA values are model predictions; for MSKCC weirEvaluatingSkinTone2025, ITA values correspond to ground-truth measurements.
Figure 4: Heatmap of Fitzpatrick skin type distributions across datasets (within-dataset percentages). For ISIC 2020 rotembergPatientcentricDatasetImages2021a and MILK10k tschandlMILK10kHierarchicalMultimodal2026, Fitzpatrick types are model predictions; for MSKCC weirEvaluatingSkinTone2025, ground-truth Fitzpatrick labels are shown.

Colorimeter-Supervised Skin Tone Estimation from Dermatoscopic Images for Fairness Auditing

TL;DR

Abstract

Colorimeter-Supervised Skin Tone Estimation from Dermatoscopic Images for Fairness Auditing

Authors

TL;DR

Abstract

Table of Contents

Figures (4)