Skin Cancer Machine Learning Model Tone Bias

James Pope; Md Hassanuzzaman; William Chapman; Huw Day; Mingmar Sherpa; Omar Emara; Nirmala Adhikari; Ayush Joshi

Skin Cancer Machine Learning Model Tone Bias

James Pope, Md Hassanuzzaman, William Chapman, Huw Day, Mingmar Sherpa, Omar Emara, Nirmala Adhikari, Ayush Joshi

TL;DR

This study investigates tone bias in CNN-based skin cancer classification trained on ISIC dermoscopic images with Fitzpatrick skin-type annotations. By comparing imbalanced and tone-balanced datasets, it quantifies bias using disparate impact and finds persistent bias toward light tones, with Tone DI values below the fairness threshold $0.80$ in both setups ($\approx0.58$ and $\approx0.684$). The work demonstrates that simply balancing data does not eliminate tone-related bias and that results are constrained by a relatively small, tone-annotated subset of images, underscoring the need for larger tone-labeled datasets and additional bias-mitigation strategies. The authors advocate for future directions including tone classifiers and explainable AI approaches to better understand and address tone bias in dermatological diagnosis systems, aiming to improve public trust and fairness in AI health applications.

Abstract

Background: Many open-source skin cancer image datasets are the result of clinical trials conducted in countries with lighter skin tones. Due to this tone imbalance, machine learning models derived from these datasets can perform well at detecting skin cancer for lighter skin tones. Any tone bias in these models could introduce fairness concerns and reduce public trust in the artificial intelligence health field. Methods: We examine a subset of images from the International Skin Imaging Collaboration (ISIC) archive that provide tone information. The subset has a significant tone imbalance. These imbalances could explain a model's tone bias. To address this, we train models using the imbalanced dataset and a balanced dataset to compare against. The datasets are used to train a deep convolutional neural network model to classify the images as malignant or benign. We then evaluate the models' disparate impact, based on selection rate, relative to dark or light skin tone. Results: Using the imbalanced dataset, we found that the model is significantly better at detecting malignant images in lighter tone resulting in a disparate impact of 0.577. Using the balanced dataset, we found that the model is also significantly better at detecting malignant images in lighter versus darker tones with a disparate impact of 0.684. Using the imbalanced or balanced dataset to train the model still results in a disparate impact well below the standard threshold of 0.80 which suggests the model is biased with respect to skin tone. Conclusion: The results show that typical skin cancer machine learning models can be tone biased. These results provide evidence that diagnosis or tone imbalance is not the cause of the bias. Other techniques will be necessary to identify and address the bias in these models, an area of future investigation.

Skin Cancer Machine Learning Model Tone Bias

TL;DR

in both setups (

and

). The work demonstrates that simply balancing data does not eliminate tone-related bias and that results are constrained by a relatively small, tone-annotated subset of images, underscoring the need for larger tone-labeled datasets and additional bias-mitigation strategies. The authors advocate for future directions including tone classifiers and explainable AI approaches to better understand and address tone bias in dermatological diagnosis systems, aiming to improve public trust and fairness in AI health applications.

Abstract

Paper Structure (25 sections, 5 equations, 8 figures, 6 tables)

This paper contains 25 sections, 5 equations, 8 figures, 6 tables.

Introduction
Material and Methods
Study design and population
Dataset Ethics
Dataset
Evaluation Metrics
Confusion Matrix Definition
Model Evaluation Metric
Model Training
Evaluating Model Bias
Model Bias Solutions
Model Architecture
Hyper-parameter Tuning
Experimental Setup
Results
...and 10 more sections

Figures (8)

Figure 1: Dataset Fitzpatrick Skin Types to Tone Mapping
Figure 2: Imbalanced Dataset
Figure 3: Balanced Dataset (Sampled)
Figure 4: Skin Cancer Image Classifier Architecture / Model
Figure 5: Imbalanced Experiment Setup
...and 3 more figures

Skin Cancer Machine Learning Model Tone Bias

TL;DR

Abstract

Skin Cancer Machine Learning Model Tone Bias

Authors

TL;DR

Abstract

Table of Contents

Figures (8)