Table of Contents
Fetching ...

FairSeg: A Large-Scale Medical Image Segmentation Dataset for Fairness Learning Using Segment Anything Model with Fair Error-Bound Scaling

Yu Tian, Min Shi, Yan Luo, Ava Kouhana, Tobias Elze, Mengyu Wang

TL;DR

This work addresses fairness in medical image segmentation by introducing Harvard-FairSeg, the first large-scale fairness dataset for segmentation of optic disc and cup in SLO fundus images, annotated with six sensitive attributes. It pairs FEBS, a loss-reweighting strategy based on per-group error bounds, with an equity-scaled performance metric (ESSP) to assess and improve fairness, and leverages segmentation foundation models like SAMed and TransUNet to benchmark debiasing approaches. The study shows FEBS can achieve superior or comparable fairness performance compared to state-of-the-art baselines across multiple demographic groups, highlighting the potential of fairness-focused losses and metrics in medical segmentation. Public release of the dataset and code aims to catalyze further research and practical adoption of fairness considerations in medical AI applications.

Abstract

Fairness in artificial intelligence models has gained significantly more attention in recent years, especially in the area of medicine, as fairness in medical models is critical to people's well-being and lives. High-quality medical fairness datasets are needed to promote fairness learning research. Existing medical fairness datasets are all for classification tasks, and no fairness datasets are available for medical segmentation, while medical segmentation is an equally important clinical task as classifications, which can provide detailed spatial information on organ abnormalities ready to be assessed by clinicians. In this paper, we propose the first fairness dataset for medical segmentation named Harvard-FairSeg with 10,000 subject samples. In addition, we propose a fair error-bound scaling approach to reweight the loss function with the upper error-bound in each identity group, using the segment anything model (SAM). We anticipate that the segmentation performance equity can be improved by explicitly tackling the hard cases with high training errors in each identity group. To facilitate fair comparisons, we utilize a novel equity-scaled segmentation performance metric to compare segmentation metrics in the context of fairness, such as the equity-scaled Dice coefficient. Through comprehensive experiments, we demonstrate that our fair error-bound scaling approach either has superior or comparable fairness performance to the state-of-the-art fairness learning models. The dataset and code are publicly accessible via https://ophai.hms.harvard.edu/datasets/harvard-fairseg10k.

FairSeg: A Large-Scale Medical Image Segmentation Dataset for Fairness Learning Using Segment Anything Model with Fair Error-Bound Scaling

TL;DR

This work addresses fairness in medical image segmentation by introducing Harvard-FairSeg, the first large-scale fairness dataset for segmentation of optic disc and cup in SLO fundus images, annotated with six sensitive attributes. It pairs FEBS, a loss-reweighting strategy based on per-group error bounds, with an equity-scaled performance metric (ESSP) to assess and improve fairness, and leverages segmentation foundation models like SAMed and TransUNet to benchmark debiasing approaches. The study shows FEBS can achieve superior or comparable fairness performance compared to state-of-the-art baselines across multiple demographic groups, highlighting the potential of fairness-focused losses and metrics in medical segmentation. Public release of the dataset and code aims to catalyze further research and practical adoption of fairness considerations in medical AI applications.

Abstract

Fairness in artificial intelligence models has gained significantly more attention in recent years, especially in the area of medicine, as fairness in medical models is critical to people's well-being and lives. High-quality medical fairness datasets are needed to promote fairness learning research. Existing medical fairness datasets are all for classification tasks, and no fairness datasets are available for medical segmentation, while medical segmentation is an equally important clinical task as classifications, which can provide detailed spatial information on organ abnormalities ready to be assessed by clinicians. In this paper, we propose the first fairness dataset for medical segmentation named Harvard-FairSeg with 10,000 subject samples. In addition, we propose a fair error-bound scaling approach to reweight the loss function with the upper error-bound in each identity group, using the segment anything model (SAM). We anticipate that the segmentation performance equity can be improved by explicitly tackling the hard cases with high training errors in each identity group. To facilitate fair comparisons, we utilize a novel equity-scaled segmentation performance metric to compare segmentation metrics in the context of fairness, such as the equity-scaled Dice coefficient. Through comprehensive experiments, we demonstrate that our fair error-bound scaling approach either has superior or comparable fairness performance to the state-of-the-art fairness learning models. The dataset and code are publicly accessible via https://ophai.hms.harvard.edu/datasets/harvard-fairseg10k.
Paper Structure (11 sections, 7 equations, 1 figure, 6 tables)

This paper contains 11 sections, 7 equations, 1 figure, 6 tables.

Figures (1)

  • Figure 1: The process to obtain ground truth disc and cup boundaries on the SLO fundus image. The OCT and SLO fundus images have been previously registered using NiftyReg.