Harvard Glaucoma Fairness: A Retinal Nerve Disease Dataset for Fairness Learning and Fair Identity Normalization
Yan Luo, Yu Tian, Min Shi, Louis R. Pasquale, Lucy Q. Shen, Nazlee Zebardast, Tobias Elze, Mengyu Wang
TL;DR
Harvard-GF introduces the first publicly available fairness-focused medical imaging dataset with 2D RNFLT maps and 3D OCT data, balanced across White, Black, and Asian groups to study glaucoma detection fairness. The authors propose Fair Identity Normalization (FIN), a plug-and-play normalization that uses identity-specific statistics to equalize feature importance, and an equity-scaled metric to jointly evaluate accuracy and fairness. Across RNFLT and OCT data, FIN outperforms state-of-the-art fairness methods, especially improving performance for underrepresented racial groups while maintaining overall accuracy. The work provides both a valuable dataset and practical fairness techniques for medical imaging AI, with public availability to support reproducibility and broader fairness benchmarking.
Abstract
Fairness (also known as equity interchangeably) in machine learning is important for societal well-being, but limited public datasets hinder its progress. Currently, no dedicated public medical datasets with imaging data for fairness learning are available, though minority groups suffer from more health issues. To address this gap, we introduce Harvard Glaucoma Fairness (Harvard-GF), a retinal nerve disease dataset with both 2D and 3D imaging data and balanced racial groups for glaucoma detection. Glaucoma is the leading cause of irreversible blindness globally with Blacks having doubled glaucoma prevalence than other races. We also propose a fair identity normalization (FIN) approach to equalize the feature importance between different identity groups. Our FIN approach is compared with various the-state-of-the-art fairness learning methods with superior performance in the racial, gender, and ethnicity fairness tasks with 2D and 3D imaging data, which demonstrate the utilities of our dataset Harvard-GF for fairness learning. To facilitate fairness comparisons between different models, we propose an equity-scaled performance measure, which can be flexibly used to compare all kinds of performance metrics in the context of fairness. The dataset and code are publicly accessible via \url{https://ophai.hms.harvard.edu/datasets/harvard-glaucoma-fairness-3300-samples/}.
