Table of Contents
Fetching ...

DermDiff: Generative Diffusion Model for Mitigating Racial Biases in Dermatology Diagnosis

Nusrat Munia, Abdullah-Al-Zubaer Imran

TL;DR

The paper tackles racial bias in dermatology AI arising from underrepresented skin tones in public datasets. It introduces DermDiff, a latent diffusion-based framework conditioned on skin-tone and disease attributes via CLIP prompts, complemented by a skin-tone detector and a downstream ResNeXt-101 classifier, trained with real and synthetic data. Empirical results show high fidelity and diversity of generated images (as measured by $FID$ and $MS\text{-}SSIM$) and improved downstream performance and fairness metrics for darker skin tones when synthetic data are incorporated. This approach offers a scalable path to more equitable dermatology AI by augmenting imbalanced datasets with controlled synthetic imagery while preserving diagnostic utility.

Abstract

Skin diseases, such as skin cancer, are a significant public health issue, and early diagnosis is crucial for effective treatment. Artificial intelligence (AI) algorithms have the potential to assist in triaging benign vs malignant skin lesions and improve diagnostic accuracy. However, existing AI models for skin disease diagnosis are often developed and tested on limited and biased datasets, leading to poor performance on certain skin tones. To address this problem, we propose a novel generative model, named DermDiff, that can generate diverse and representative dermoscopic image data for skin disease diagnosis. Leveraging text prompting and multimodal image-text learning, DermDiff improves the representation of underrepresented groups (patients, diseases, etc.) in highly imbalanced datasets. Our extensive experimentation showcases the effectiveness of DermDiff in terms of high fidelity and diversity. Furthermore, downstream evaluation suggests the potential of DermDiff in mitigating racial biases for dermatology diagnosis. Our code is available at https://github.com/Munia03/DermDiff

DermDiff: Generative Diffusion Model for Mitigating Racial Biases in Dermatology Diagnosis

TL;DR

The paper tackles racial bias in dermatology AI arising from underrepresented skin tones in public datasets. It introduces DermDiff, a latent diffusion-based framework conditioned on skin-tone and disease attributes via CLIP prompts, complemented by a skin-tone detector and a downstream ResNeXt-101 classifier, trained with real and synthetic data. Empirical results show high fidelity and diversity of generated images (as measured by and ) and improved downstream performance and fairness metrics for darker skin tones when synthetic data are incorporated. This approach offers a scalable path to more equitable dermatology AI by augmenting imbalanced datasets with controlled synthetic imagery while preserving diagnostic utility.

Abstract

Skin diseases, such as skin cancer, are a significant public health issue, and early diagnosis is crucial for effective treatment. Artificial intelligence (AI) algorithms have the potential to assist in triaging benign vs malignant skin lesions and improve diagnostic accuracy. However, existing AI models for skin disease diagnosis are often developed and tested on limited and biased datasets, leading to poor performance on certain skin tones. To address this problem, we propose a novel generative model, named DermDiff, that can generate diverse and representative dermoscopic image data for skin disease diagnosis. Leveraging text prompting and multimodal image-text learning, DermDiff improves the representation of underrepresented groups (patients, diseases, etc.) in highly imbalanced datasets. Our extensive experimentation showcases the effectiveness of DermDiff in terms of high fidelity and diversity. Furthermore, downstream evaluation suggests the potential of DermDiff in mitigating racial biases for dermatology diagnosis. Our code is available at https://github.com/Munia03/DermDiff

Paper Structure

This paper contains 11 sections, 3 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Proposed DermDiff framework: skin tone detector identifies the patient races based on dermoscopic images in a dataset; race and other attributes are used to condition on image generation in the Diffusion-based generative model; and finally Skin disease diagnosis is performed on both real and generated synthetic images.
  • Figure 2: Visual comparison of the dermoscopic images in the Fitzpatrick17k dataset with the DermDiff-generated samples. Rows 1-3 denote skin tones A-C. Examples from DermDiff generated images and real images. DermDiff generated (a) benign and (b) malignant images; Fitzpatrick17k (c) benign and (d) malignant images.
  • Figure 3: Diagnostic performance on DDI dataset when the model was trained on (a) Fitzpatrick, (b) ISIC, and (c) combined real and synthetic image samples.