Table of Contents
Fetching ...

Achieve Fairness without Demographics for Dermatological Disease Diagnosis

Ching-Hao Chiu, Yu-Jen Chen, Yawen Wu, Yiyu Shi, Tsung-Yi Ho

TL;DR

This work tackles fairness in dermatological disease diagnosis without relying on demographic attributes during training. It introduces the AttEN framework, which uses an attention-based feature entanglement regularization guided by Soft Nearest Neighbor Loss and SAM-derived masks to force models to rely on diseased-region features rather than skin cues. Across ISIC 2019 and Fitzpatrick-17k, AttEN achieves superior fairness-accuracy trade-offs, outperforming methods that require sensitive attributes and matching those that do, with robust performance across backbones. The approach preserves patient privacy while delivering practical improvements in equalized opportunity and equalized odds metrics, signaling strong potential for deployment in real-world clinical settings.

Abstract

In medical image diagnosis, fairness has become increasingly crucial. Without bias mitigation, deploying unfair AI would harm the interests of the underprivileged population and potentially tear society apart. Recent research addresses prediction biases in deep learning models concerning demographic groups (e.g., gender, age, and race) by utilizing demographic (sensitive attribute) information during training. However, many sensitive attributes naturally exist in dermatological disease images. If the trained model only targets fairness for a specific attribute, it remains unfair for other attributes. Moreover, training a model that can accommodate multiple sensitive attributes is impractical due to privacy concerns. To overcome this, we propose a method enabling fair predictions for sensitive attributes during the testing phase without using such information during training. Inspired by prior work highlighting the impact of feature entanglement on fairness, we enhance the model features by capturing the features related to the sensitive and target attributes and regularizing the feature entanglement between corresponding classes. This ensures that the model can only classify based on the features related to the target attribute without relying on features associated with sensitive attributes, thereby improving fairness and accuracy. Additionally, we use disease masks from the Segment Anything Model (SAM) to enhance the quality of the learned feature. Experimental results demonstrate that the proposed method can improve fairness in classification compared to state-of-the-art methods in two dermatological disease datasets.

Achieve Fairness without Demographics for Dermatological Disease Diagnosis

TL;DR

This work tackles fairness in dermatological disease diagnosis without relying on demographic attributes during training. It introduces the AttEN framework, which uses an attention-based feature entanglement regularization guided by Soft Nearest Neighbor Loss and SAM-derived masks to force models to rely on diseased-region features rather than skin cues. Across ISIC 2019 and Fitzpatrick-17k, AttEN achieves superior fairness-accuracy trade-offs, outperforming methods that require sensitive attributes and matching those that do, with robust performance across backbones. The approach preserves patient privacy while delivering practical improvements in equalized opportunity and equalized odds metrics, signaling strong potential for deployment in real-world clinical settings.

Abstract

In medical image diagnosis, fairness has become increasingly crucial. Without bias mitigation, deploying unfair AI would harm the interests of the underprivileged population and potentially tear society apart. Recent research addresses prediction biases in deep learning models concerning demographic groups (e.g., gender, age, and race) by utilizing demographic (sensitive attribute) information during training. However, many sensitive attributes naturally exist in dermatological disease images. If the trained model only targets fairness for a specific attribute, it remains unfair for other attributes. Moreover, training a model that can accommodate multiple sensitive attributes is impractical due to privacy concerns. To overcome this, we propose a method enabling fair predictions for sensitive attributes during the testing phase without using such information during training. Inspired by prior work highlighting the impact of feature entanglement on fairness, we enhance the model features by capturing the features related to the sensitive and target attributes and regularizing the feature entanglement between corresponding classes. This ensures that the model can only classify based on the features related to the target attribute without relying on features associated with sensitive attributes, thereby improving fairness and accuracy. Additionally, we use disease masks from the Segment Anything Model (SAM) to enhance the quality of the learned feature. Experimental results demonstrate that the proposed method can improve fairness in classification compared to state-of-the-art methods in two dermatological disease datasets.
Paper Structure (25 sections, 14 equations, 4 figures, 6 tables)

This paper contains 25 sections, 14 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: The concept of our motivation for the training framework. The image that masks the skin part represents the feature map related to the diseased part, and the image that masks the diseased part signifies the feature map related to the skin part.
  • Figure 2: Illustration of the proposed "AttEN" training framework. The upper arrow entering the AttEN module represents the original input images, while the lower arrow entering AttEN represents the features from previous internal layers, denoted as Feature $A$. The arrow from the AttEN module, marked as Feature $A^{'}$, will proceed to the next layer in the model. This module can be appended at any layer in the neural network; it can also append multiple modules in the neural network.
  • Figure 3: The experimental results show the outcomes of the FATE metric under different $\lambda$ values. The horizontal axis represents the values of $\lambda$ ranging from 0 to 10. The vertical axis corresponds to the respective FATE values. In this comparison, we measure $FC$ by Eodd and $ACC$ by F1-Score, and the backbone model is ResNet18.
  • Figure 4: The qualitative results show the Grad-CAM heatmap visualization for the ISIC 2019 dataset. The first row represents the original images, the second row denotes the AttEN method introducing the guided mask into the training process, the third row is the AttEN method without introducing the guided mask into the training process, and the last row is the row with the corresponding guided masks generated by SAM. Since the testing phase model does not take the guided masks as input, we put the guided masks here only for reference.