Fitzpatrick Thresholding for Skin Image Segmentation
Duncan Stothers, Sophia Xu, Carlie Reeves, Lia Gracey
TL;DR
This work addresses the fairness gap in psoriasis segmentation-based BSA estimation by skin tone, demonstrating that darker Fitzpatrick skin types are disadvantaged by a universal threshold. It introduces Fitzpatrick thresholding, a simple, model-agnostic approach that tunes per-tone operating points (thresholds $\tau_g$) on a tuning split to maximize segmentation metrics like $Dice$ and $bIoU$, without retraining. Using a large, publicly sourced psoriasis dataset with Fitzpatrick annotations (754 images, 631 patients) and three architectures (UNet, ResUNet with SE, SETR-small), the study shows substantial improvements for the darkest group (Fitz VI): UNet $bIoU$ up to $+31.46\%$ and Dice up to $+24.13\%$, ResUNet $bIoU$ up to $+24.63\%$ and Dice up to $+18.01\%$, SETR-small $bIoU$ up to $+17.14\%$ and Dice up to $+11.04\%$. Crucially, Fitzpatrick thresholding requires no architectural changes or re-training and benefits from high accuracy skin-tone classifiers (Fitzpatrick17k accuracy > $95\%$), making it a practical fairness baseline for dermatology segmentation in both clinical and tele-dermatology settings.
Abstract
Accurate estimation of the body surface area (BSA) involved by a rash, such as psoriasis, is critical for assessing rash severity, selecting an initial treatment regimen, and following clinical treatment response. Attempts at segmentation of inflammatory skin disease such as psoriasis perform markedly worse on darker skin tones, potentially impeding equitable care. We assembled a psoriasis dataset sourced from six public atlases, annotated for Fitzpatrick skin type, and added detailed segmentation masks for every image. Reference models based on U-Net, ResU-Net, and SETR-small are trained without tone information. On the tuning split we sweep decision thresholds and select (i) global optima and (ii) per Fitzpatrick skin tone optima for Dice and binary IoU. Adapting Fitzpatrick specific thresholds lifted segmentation performance for the darkest subgroup (Fitz VI) by up to +31 % bIoU and +24 % Dice on UNet, with consistent, though smaller, gains in the same direction for ResU-Net (+25 % bIoU, +18 % Dice) and SETR-small (+17 % bIoU, +11 % Dice). Because Fitzpatrick skin tone classifiers trained on Fitzpatrick-17k now exceed 95 % accuracy, the cost of skin tone labeling required for this technique has fallen dramatically. Fitzpatrick thresholding is simple, model-agnostic, requires no architectural changes, no re-training, and is virtually cost free. We demonstrate the inclusion of Fitzpatrick thresholding as a potential future fairness baseline.
