Table of Contents
Fetching ...

Harvard Glaucoma Fairness: A Retinal Nerve Disease Dataset for Fairness Learning and Fair Identity Normalization

Yan Luo, Yu Tian, Min Shi, Louis R. Pasquale, Lucy Q. Shen, Nazlee Zebardast, Tobias Elze, Mengyu Wang

TL;DR

Harvard-GF introduces the first publicly available fairness-focused medical imaging dataset with 2D RNFLT maps and 3D OCT data, balanced across White, Black, and Asian groups to study glaucoma detection fairness. The authors propose Fair Identity Normalization (FIN), a plug-and-play normalization that uses identity-specific statistics to equalize feature importance, and an equity-scaled metric to jointly evaluate accuracy and fairness. Across RNFLT and OCT data, FIN outperforms state-of-the-art fairness methods, especially improving performance for underrepresented racial groups while maintaining overall accuracy. The work provides both a valuable dataset and practical fairness techniques for medical imaging AI, with public availability to support reproducibility and broader fairness benchmarking.

Abstract

Fairness (also known as equity interchangeably) in machine learning is important for societal well-being, but limited public datasets hinder its progress. Currently, no dedicated public medical datasets with imaging data for fairness learning are available, though minority groups suffer from more health issues. To address this gap, we introduce Harvard Glaucoma Fairness (Harvard-GF), a retinal nerve disease dataset with both 2D and 3D imaging data and balanced racial groups for glaucoma detection. Glaucoma is the leading cause of irreversible blindness globally with Blacks having doubled glaucoma prevalence than other races. We also propose a fair identity normalization (FIN) approach to equalize the feature importance between different identity groups. Our FIN approach is compared with various the-state-of-the-art fairness learning methods with superior performance in the racial, gender, and ethnicity fairness tasks with 2D and 3D imaging data, which demonstrate the utilities of our dataset Harvard-GF for fairness learning. To facilitate fairness comparisons between different models, we propose an equity-scaled performance measure, which can be flexibly used to compare all kinds of performance metrics in the context of fairness. The dataset and code are publicly accessible via \url{https://ophai.hms.harvard.edu/datasets/harvard-glaucoma-fairness-3300-samples/}.

Harvard Glaucoma Fairness: A Retinal Nerve Disease Dataset for Fairness Learning and Fair Identity Normalization

TL;DR

Harvard-GF introduces the first publicly available fairness-focused medical imaging dataset with 2D RNFLT maps and 3D OCT data, balanced across White, Black, and Asian groups to study glaucoma detection fairness. The authors propose Fair Identity Normalization (FIN), a plug-and-play normalization that uses identity-specific statistics to equalize feature importance, and an equity-scaled metric to jointly evaluate accuracy and fairness. Across RNFLT and OCT data, FIN outperforms state-of-the-art fairness methods, especially improving performance for underrepresented racial groups while maintaining overall accuracy. The work provides both a valuable dataset and practical fairness techniques for medical imaging AI, with public availability to support reproducibility and broader fairness benchmarking.

Abstract

Fairness (also known as equity interchangeably) in machine learning is important for societal well-being, but limited public datasets hinder its progress. Currently, no dedicated public medical datasets with imaging data for fairness learning are available, though minority groups suffer from more health issues. To address this gap, we introduce Harvard Glaucoma Fairness (Harvard-GF), a retinal nerve disease dataset with both 2D and 3D imaging data and balanced racial groups for glaucoma detection. Glaucoma is the leading cause of irreversible blindness globally with Blacks having doubled glaucoma prevalence than other races. We also propose a fair identity normalization (FIN) approach to equalize the feature importance between different identity groups. Our FIN approach is compared with various the-state-of-the-art fairness learning methods with superior performance in the racial, gender, and ethnicity fairness tasks with 2D and 3D imaging data, which demonstrate the utilities of our dataset Harvard-GF for fairness learning. To facilitate fairness comparisons between different models, we propose an equity-scaled performance measure, which can be flexibly used to compare all kinds of performance metrics in the context of fairness. The dataset and code are publicly accessible via \url{https://ophai.hms.harvard.edu/datasets/harvard-glaucoma-fairness-3300-samples/}.
Paper Structure (14 sections, 8 equations, 8 figures, 8 tables, 1 algorithm)

This paper contains 14 sections, 8 equations, 8 figures, 8 tables, 1 algorithm.

Figures (8)

  • Figure 1: Illustration highlighting that fairness metrics such as DPD and DEOdds may not adequately account for the trade-off between accuracy and equity, even when the social identities associated with the samples are balanced. This misalignment is particularly problematic in safety-critical medical applications, which demand high accuracy.
  • Figure 2: Illustrations that depict RNFLT maps, OCT B-scans images, and the relationship between the two data types.
  • Figure 3: The distributions of the samples categorized by various factors, including glaucoma class (a), race (b), gender (c), ethnicity (d), and age (e).
  • Figure 4: The distributions of retinal nerve fiber layer thickness and vision loss severities measured by mean deviation against different racial and gender groups.
  • Figure 5: Schematic view of the proposed fair identity normalization.
  • ...and 3 more figures