Table of Contents
Fetching ...

Exploring the Impact of Skin Color on Skin Lesion Segmentation

Kuniko Paxton, Medina Kapo, Amila Akagić, Koorosh Aslansefat, Dhavalkumar Thakker, Yiannis Papadopoulos

Abstract

Skin cancer, particularly melanoma, remains a major cause of morbidity and mortality, making early detection critical. AI-driven dermatology systems often rely on skin lesion segmentation as a preprocessing step to delineate the lesion from surrounding skin and support downstream analysis. While fairness concerns regarding skin tone have been widely studied for lesion classification, the influence of skin tone on the segmentation stage remains under-quantified and is frequently assessed using coarse, discrete skin tone categories. In this work, we evaluate three strong segmentation architectures (UNet, DeepLabV3 with a ResNet50 backbone, and DINOv2) on two public dermoscopic datasets (HAM10000 and ISIC2017) and introduce a continuous pigment or contrast analysis that treats pixel-wise ITA values as distributions. Using Wasserstein distances between within-image distributions for skin-only, lesion-only, and whole-image regions, we quantify lesion skin contrast and relate it to segmentation performance across multiple metrics. Within the range represented in these datasets, global skin tone metrics (Fitzpatrick grouping or mean ITA) show weak association with segmentation quality. In contrast, low lesion-skin contrast is consistently associated with larger segmentation errors in models, indicating that boundary ambiguity and low contrast are key drivers of failure. These findings suggest that fairness improvements in dermoscopic segmentation should prioritize robust handling of low-contrast lesions, and the distribution-based pigment measures provide a more informative audit signal than discrete skin-tone categories.

Exploring the Impact of Skin Color on Skin Lesion Segmentation

Abstract

Skin cancer, particularly melanoma, remains a major cause of morbidity and mortality, making early detection critical. AI-driven dermatology systems often rely on skin lesion segmentation as a preprocessing step to delineate the lesion from surrounding skin and support downstream analysis. While fairness concerns regarding skin tone have been widely studied for lesion classification, the influence of skin tone on the segmentation stage remains under-quantified and is frequently assessed using coarse, discrete skin tone categories. In this work, we evaluate three strong segmentation architectures (UNet, DeepLabV3 with a ResNet50 backbone, and DINOv2) on two public dermoscopic datasets (HAM10000 and ISIC2017) and introduce a continuous pigment or contrast analysis that treats pixel-wise ITA values as distributions. Using Wasserstein distances between within-image distributions for skin-only, lesion-only, and whole-image regions, we quantify lesion skin contrast and relate it to segmentation performance across multiple metrics. Within the range represented in these datasets, global skin tone metrics (Fitzpatrick grouping or mean ITA) show weak association with segmentation quality. In contrast, low lesion-skin contrast is consistently associated with larger segmentation errors in models, indicating that boundary ambiguity and low contrast are key drivers of failure. These findings suggest that fairness improvements in dermoscopic segmentation should prioritize robust handling of low-contrast lesions, and the distribution-based pigment measures provide a more informative audit signal than discrete skin-tone categories.

Paper Structure

This paper contains 22 sections, 3 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Research Analysis Overview: After semantic segmentation, the skin tone is measured by green colored six comparison patterns in (a). From left to right, the original image, the ground truth annotation, and the image in which the annotations have been used to remove the hair and extract only the skin-colored pixels. The fourth image shows annotations used to measure only the lesion's color, and then the base color image. The image (b) reproduced from medelink illustrates the ITA that identified pixel-wise color links to skin. (c) is a detailed description of six comparison patterns.
  • Figure 2: General Performance of each Class with HAM: The figure on the left shows metrics for evaluating prediction accuracy, and the higher the accuracy, the higher the value. In contrast, the figure on the right shows the error rate, and the lower the score, the better the model's performance.
  • Figure 3: Skin Color Distribution Skin or Lesion Color Each Class: The eight figures in the upper row show the distribution of each class based on the evaluation of the colors of each image in the HAM. Each figure follows the measure methods for color patterns in Table \ref{['fig:main']} and shows trends in the distribution of color patterns in the dataset.
  • Figure 4: Correlation Ranges between Skin Color Measures and Segmentation Performance Across Datasets and Models: This figure presents Spearman rank correlations between each segmentation performance metric and skin-color distance measures across different comparison patterns. The light blue band visualizes the minimum–maximum range of correlations across metrics.
  • Figure 5: DINOv2 Absolute Spearman correlations between skin tone and segmentation performance metrics across seven disease classes
  • ...and 1 more figures