Table of Contents
Fetching ...

FairSkin: Fair Diffusion for Skin Disease Image Generation

Ruichen Zhang, Yuguang Yao, Zhen Tan, Zhiming Li, Pan Wang, Huan Liu, Jingtong Hu, Sijia Liu, Tianlong Chen

TL;DR

FairSkin is proposed, a novel DM framework that mitigates these biases through a three-level resampling mechanism, ensuring fairer representation across racial and disease categories, contributing to more equitable skin disease detection in clinical settings.

Abstract

Image generation is a prevailing technique for clinical data augmentation for advancing diagnostic accuracy and reducing healthcare disparities. Diffusion Model (DM) has become a leading method in generating synthetic medical images, but it suffers from a critical twofold bias: (1) The quality of images generated for Caucasian individuals is significantly higher, as measured by the Frechet Inception Distance (FID). (2) The ability of the downstream-task learner to learn critical features from disease images varies across different skin tones. These biases pose significant risks, particularly in skin disease detection, where underrepresentation of certain skin tones can lead to misdiagnosis or neglect of specific conditions. To address these challenges, we propose FairSkin, a novel DM framework that mitigates these biases through a three-level resampling mechanism, ensuring fairer representation across racial and disease categories. Our approach significantly improves the diversity and quality of generated images, contributing to more equitable skin disease detection in clinical settings.

FairSkin: Fair Diffusion for Skin Disease Image Generation

TL;DR

FairSkin is proposed, a novel DM framework that mitigates these biases through a three-level resampling mechanism, ensuring fairer representation across racial and disease categories, contributing to more equitable skin disease detection in clinical settings.

Abstract

Image generation is a prevailing technique for clinical data augmentation for advancing diagnostic accuracy and reducing healthcare disparities. Diffusion Model (DM) has become a leading method in generating synthetic medical images, but it suffers from a critical twofold bias: (1) The quality of images generated for Caucasian individuals is significantly higher, as measured by the Frechet Inception Distance (FID). (2) The ability of the downstream-task learner to learn critical features from disease images varies across different skin tones. These biases pose significant risks, particularly in skin disease detection, where underrepresentation of certain skin tones can lead to misdiagnosis or neglect of specific conditions. To address these challenges, we propose FairSkin, a novel DM framework that mitigates these biases through a three-level resampling mechanism, ensuring fairer representation across racial and disease categories. Our approach significantly improves the diversity and quality of generated images, contributing to more equitable skin disease detection in clinical settings.

Paper Structure

This paper contains 28 sections, 6 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Overview of skin disease imbalance and the FairSkin framework: addressing long-tail distributions in skin disease data (Left) and improving fairness across racial groups through three-level resampling (Right).
  • Figure 2: An overview of the FairSkin framework, illustrating the pipeline from an imbalanced dataset to balanced diffusion model (DM) training and downstream balancing. The process includes class balanced and square root random sampling methods for training data, balanced DM training incorporating class diversity loss, and downstream balancing through imbalance-aware augmentation and dynamic reweighting based on validation accuracy.
  • Figure 3: (a) The comparison of FairSkin with baselines on downstream tasks. Under the condition of no data augmentation, the classifier exhibits the worst fairness. For other methods, we generate 7,500 images for data augmentation. FairSkin consistently demonstrates superior performance across various fairness metrics compared to other methods. (b) Variation in augmented dataset size. In this experiment, we provided an equal number of augmented images for each subcategory. Augmentation-Num refers to the number of augmented images per class. The results show that ACC, ESSP, and DP each have their own optimal number of augmented images.
  • Figure 4: ACC scores under different methods. We evaluated the ACC for different racial groups as well as the overall ACC. Our method slightly reduced the ACC for groups with higher classification accuracy, but significantly improved the ACC for groups with lower classification accuracy, thereby enhancing fairness.
  • Figure 5: Ethnic proportion search for imbalance-aware augmentation when using FairSkin for downstream disease classification task. We fixed the total number of images for data augmentation at 7,500. All images were generated using the model trained with Training Data Resampling and Balanced DM Training. During the classifier training process, we maintained a fixed racial composition for data augmentation, for example, using a ratio of African:Asian:Caucasian = 0.3:0.2:0.5, which corresponds to 2,250:1,500:3,750 images, with an equal number of images for each disease type within each racial group. We calculated the classifier's ACC, DP, and ESSP.
  • ...and 1 more figures