Addressing Fairness Issues in Deep Learning-Based Medical Image Analysis: A Systematic Review

Zikang Xu; Jun Li; Qingsong Yao; Han Li; Mingyue Zhao; S. Kevin Zhou

Addressing Fairness Issues in Deep Learning-Based Medical Image Analysis: A Systematic Review

Zikang Xu, Jun Li, Qingsong Yao, Han Li, Mingyue Zhao, S. Kevin Zhou

TL;DR

This paper surveys fairness in deep learning–based medical image analysis, outlining group fairness concepts and key metrics such as Demographic Parity, Accuracy Parity, Equalized Odds, and Equal Opportunity, with formal definitions provided for clarity. It shows that fairness research in MedIA bifurcates into fairness evaluation and unfairness mitigation, synthesizing methods across pre-, in-, and post-processing, and catalogs relevant datasets used for benchmarking. The review highlights widespread subgroup disparities across modalities like brain MRI, dermatology, and chest X-ray, driven by attributes such as sex, age, race, and skin tone, and discusses the tension between mathematical fairness and clinical equity, including challenges posed by foundation models. Finally, it calls for cross-disciplinary collaboration among AI researchers, clinicians, ethicists, and policymakers to develop robust, governance-backed strategies that advance equitable MedIA practice.

Abstract

Deep learning algorithms have demonstrated remarkable efficacy in various medical image analysis (MedIA) applications. However, recent research highlights a performance disparity in these algorithms when applied to specific subgroups, such as exhibiting poorer predictive performance in elderly females. Addressing this fairness issue has become a collaborative effort involving AI scientists and clinicians seeking to understand its origins and develop solutions for mitigation within MedIA. In this survey, we thoroughly examine the current advancements in addressing fairness issues in MedIA, focusing on methodological approaches. We introduce the basics of group fairness and subsequently categorize studies on fair MedIA into fairness evaluation and unfairness mitigation. Detailed methods employed in these studies are presented too. Our survey concludes with a discussion of existing challenges and opportunities in establishing a fair MedIA and healthcare system. By offering this comprehensive review, we aim to foster a shared understanding of fairness among AI researchers and clinicians, enhance the development of unfairness mitigation methods, and contribute to the creation of an equitable MedIA society.

Addressing Fairness Issues in Deep Learning-Based Medical Image Analysis: A Systematic Review

TL;DR

Abstract

Paper Structure (20 sections, 7 figures, 5 tables)

This paper contains 20 sections, 7 figures, 5 tables.

Introduction
The Basics of Group Fairness
Results
Fairness Evaluation
Benchmarking Unfairness in Various MedIA Tasks
Unfairness Source Tracing and Mechanism Discovery
Unfairness Mitigation
Pre-Processing
In-Processing
Post-Processing
Fairness Datasets in MedIA
Discussion
Being aware of the Sources of Unfairness to Find Corresponding Solutions
Differences between Fair MedIA and Fair Facial Recognition
Mitigating Discrepancies in the Interpretation of Fairness across Diverse Perspectives
...and 5 more sections

Figures (7)

Figure 1: Ideal situations where various fairness criteria are satisfied. From left to right: Demographic Parity, Accuracy Parity, Equalized Odds, Equal Opportunity. The equations below compute the value of different criteria for the Male and Female groups.
Figure 2: In a scenario involving two sensitive attributes, namely sex (male, female) and race (White, Black), demographic parity is achieved concerning sex but not race.
Figure 3: PRISMA diagram for this review. * denotes that six studies have been overcounted due to their involvement in research across multiple directions FE: Fairness Evaluation; UM: Unfairness Mitigation; Pre: Pre-processing; In: In-processing; Post: Post-processing.
Figure 4: Summary of data extracted from studies in our systematic review: (a) Annual trends in research on fairness in MedIA. (b) Prevalence of various medical imaging modalities, research tasks, and associated sensitive attributes.
Figure 5: Visual disparities between images with different sensitive attributes. (a)(b): images with dark skin and light skin from Fitzpatrick-17 Dataset groh2021evaluating; (c)(d): images of a male patient and a male patient from FairSeg Dataset tian2024fairseg.
...and 2 more figures

Addressing Fairness Issues in Deep Learning-Based Medical Image Analysis: A Systematic Review

TL;DR

Abstract

Addressing Fairness Issues in Deep Learning-Based Medical Image Analysis: A Systematic Review

Authors

TL;DR

Abstract

Table of Contents

Figures (7)