Table of Contents
Fetching ...

A Generative AI Approach for Reducing Skin Tone Bias in Skin Cancer Classification

Areez Muhammed Shabu, Mohammad Samar Ansari, Asra Aslam

TL;DR

It is demonstrated that synthetic data augmentation with Generative AI integration can substantially reduce bias with increase fairness in conventional dermatological diagnostics and open challenges for future directions.

Abstract

Skin cancer is one of the most common cancers worldwide and early detection is critical for effective treatment. However, current AI diagnostic tools are often trained on datasets dominated by lighter skin tones, leading to reduced accuracy and fairness for people with darker skin. The International Skin Imaging Collaboration (ISIC) dataset, one of the most widely used benchmarks, contains over 70% light skin images while dark skins fewer than 8%. This imbalance poses a significant barrier to equitable healthcare delivery and highlights the urgent need for methods that address demographic diversity in medical imaging. This paper addresses this challenge of skin tone imbalance in automated skin cancer detection using dermoscopic images. To overcome this, we present a generative augmentation pipeline that fine-tunes a pre-trained Stable Diffusion model using Low-Rank Adaptation (LoRA) on the image dark-skin subset of the ISIC dataset and generates synthetic dermoscopic images conditioned on lesion type and skin tone. In this study, we investigated the utility of these images on two downstream tasks: lesion segmentation and binary classification. For segmentation, models trained on the augmented dataset and evaluated on held-out real images show consistent improvements in IoU, Dice coefficient, and boundary accuracy. These evalutions provides the verification of Generated dataset. For classification, an EfficientNet-B0 model trained on the augmented dataset achieved 92.14% accuracy. This paper demonstrates that synthetic data augmentation with Generative AI integration can substantially reduce bias with increase fairness in conventional dermatological diagnostics and open challenges for future directions.

A Generative AI Approach for Reducing Skin Tone Bias in Skin Cancer Classification

TL;DR

It is demonstrated that synthetic data augmentation with Generative AI integration can substantially reduce bias with increase fairness in conventional dermatological diagnostics and open challenges for future directions.

Abstract

Skin cancer is one of the most common cancers worldwide and early detection is critical for effective treatment. However, current AI diagnostic tools are often trained on datasets dominated by lighter skin tones, leading to reduced accuracy and fairness for people with darker skin. The International Skin Imaging Collaboration (ISIC) dataset, one of the most widely used benchmarks, contains over 70% light skin images while dark skins fewer than 8%. This imbalance poses a significant barrier to equitable healthcare delivery and highlights the urgent need for methods that address demographic diversity in medical imaging. This paper addresses this challenge of skin tone imbalance in automated skin cancer detection using dermoscopic images. To overcome this, we present a generative augmentation pipeline that fine-tunes a pre-trained Stable Diffusion model using Low-Rank Adaptation (LoRA) on the image dark-skin subset of the ISIC dataset and generates synthetic dermoscopic images conditioned on lesion type and skin tone. In this study, we investigated the utility of these images on two downstream tasks: lesion segmentation and binary classification. For segmentation, models trained on the augmented dataset and evaluated on held-out real images show consistent improvements in IoU, Dice coefficient, and boundary accuracy. These evalutions provides the verification of Generated dataset. For classification, an EfficientNet-B0 model trained on the augmented dataset achieved 92.14% accuracy. This paper demonstrates that synthetic data augmentation with Generative AI integration can substantially reduce bias with increase fairness in conventional dermatological diagnostics and open challenges for future directions.
Paper Structure (24 sections, 1 equation, 5 figures, 5 tables)

This paper contains 24 sections, 1 equation, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Sample of a non-melanoma and melanoma image from ISIC dataset.
  • Figure 2: End-to-end pipeline. The Data Preparation branch (left) analyses the ISIC dataset for skin tone distribution and isolates the underrepresented dark-skin subset (FST V--VI, 1,407 images). The Generation Branch (right) fine-tunes a Stable Diffusion model via LoRA on this subset, generates 808 synthetic images, and validates them using statistical similarity metrics. In the Integration stage, original and synthetic data are merged and preprocessed. This stage also involved validation of augmented dataset with segmentation. The Classification Branch trains an EfficientNet-B0 and evaluates performance.
  • Figure 3: Generated Images of a non-melanoma and melanoma using proposed Gen AI-Diffusers based Pipeline
  • Figure 4: Distribution of melanoma and non-melanoma images across Fitzpatrick skin types (log scale). The shaded region highlights FST V and VI, which together represent under 8% of the dataset.
  • Figure 5: Training dynamics of the augmented EfficientNet-B0 model (a) loss, (b) accuracy, (c) validation AUC.