Table of Contents
Fetching ...

Assessing the Generalizability of Deep Neural Networks-Based Models for Black Skin Lesions

Luana Barros, Levy Chaves, Sandra Avila

TL;DR

The study addresses the generalization gap of deep neural networks for skin lesion classification when applied to black skin and acral regions. It employs six pre-trained models (five self-supervised and one supervised) with a ResNet-50 backbone to evaluate two analyses: lesions in acral regions and lesions across Fitzpatrick skin tones, using carefully curated datasets and a two-stage evaluation pipeline. Results reveal strong performance on white-skin data but substantially poorer accuracy and bias on acral and darker-skinned lesions, underscoring the need for diverse training data and population-aware modeling. The work highlights ethical and practical implications for clinical deployment and advocates for transparent, representative datasets to enable inclusive dermatology AI.

Abstract

Melanoma is the most severe type of skin cancer due to its ability to cause metastasis. It is more common in black people, often affecting acral regions: palms, soles, and nails. Deep neural networks have shown tremendous potential for improving clinical care and skin cancer diagnosis. Nevertheless, prevailing studies predominantly rely on datasets of white skin tones, neglecting to report diagnostic outcomes for diverse patient skin tones. In this work, we evaluate supervised and self-supervised models in skin lesion images extracted from acral regions commonly observed in black individuals. Also, we carefully curate a dataset containing skin lesions in acral regions and assess the datasets concerning the Fitzpatrick scale to verify performance on black skin. Our results expose the poor generalizability of these models, revealing their favorable performance for lesions on white skin. Neglecting to create diverse datasets, which necessitates the development of specialized models, is unacceptable. Deep neural networks have great potential to improve diagnosis, particularly for populations with limited access to dermatology. However, including black skin lesions is necessary to ensure these populations can access the benefits of inclusive technology.

Assessing the Generalizability of Deep Neural Networks-Based Models for Black Skin Lesions

TL;DR

The study addresses the generalization gap of deep neural networks for skin lesion classification when applied to black skin and acral regions. It employs six pre-trained models (five self-supervised and one supervised) with a ResNet-50 backbone to evaluate two analyses: lesions in acral regions and lesions across Fitzpatrick skin tones, using carefully curated datasets and a two-stage evaluation pipeline. Results reveal strong performance on white-skin data but substantially poorer accuracy and bias on acral and darker-skinned lesions, underscoring the need for diverse training data and population-aware modeling. The work highlights ethical and practical implications for clinical deployment and advocates for transparent, representative datasets to enable inclusive dermatology AI.

Abstract

Melanoma is the most severe type of skin cancer due to its ability to cause metastasis. It is more common in black people, often affecting acral regions: palms, soles, and nails. Deep neural networks have shown tremendous potential for improving clinical care and skin cancer diagnosis. Nevertheless, prevailing studies predominantly rely on datasets of white skin tones, neglecting to report diagnostic outcomes for diverse patient skin tones. In this work, we evaluate supervised and self-supervised models in skin lesion images extracted from acral regions commonly observed in black individuals. Also, we carefully curate a dataset containing skin lesions in acral regions and assess the datasets concerning the Fitzpatrick scale to verify performance on black skin. Our results expose the poor generalizability of these models, revealing their favorable performance for lesions on white skin. Neglecting to create diverse datasets, which necessitates the development of specialized models, is unacceptable. Deep neural networks have great potential to improve diagnosis, particularly for populations with limited access to dermatology. However, including black skin lesions is necessary to ensure these populations can access the benefits of inclusive technology.
Paper Structure (19 sections, 3 figures, 7 tables)

This paper contains 19 sections, 3 figures, 7 tables.

Figures (3)

  • Figure 1: The Fitzpatrick skin type scale. (a) Type 1 (light): pale skin, always burns, and never tans; (b) Type 2 (white): fair, usually burns, tans with difficulty; (c) Type 3 (medium): white to olive, sometimes mild burn, gradually tans to olive; Type 4 (olive): moderate brown, rarely burns, tans with ease to moderate brown; Type 5 (brown): dark brown, very rarely burns, tans very easily; Type 6 (black): very dark brown to black, never burns, tans very easily, deeply pigmented.
  • Figure 2: Each image corresponds to a melanoma sample and is associated with a specific Fitzpatrick scale value, representing a range of skin tones. The images are organized from left to right, following the Fitzpatrick scale (1 to 6). Images retrieved from Fitzpatrick 17k dataset groh2021evaluating.
  • Figure 3: Evaluation pipeline for all models. Given a test image, we adopt the final confidence score as the average confidence over a batch of 50 augmented copies of the input image.