Table of Contents
Fetching ...

The Boundaries of Fair AI in Medical Image Prognosis: A Causal Perspective

Thai-Hoang Pham, Jiayuan Chen, Seungyeon Lee, Yuanlong Wang, Sayoko Moroi, Xueru Zhang, Ping Zhang

TL;DR

This work tackles fairness in medical-image prognosis by introducing FairTTE, a causal framework for analyzing sources of bias in time-to-event predictions. It presents a unified SCM-based setup, decomposes bias into identifiable components, and benchmarks three TTE models across three public datasets with multiple sensitive attributes. Across extensive experiments on over 20,000 models and five fairness methods, it documents pervasive bias and limited mitigation, with fairness tied to distribution shifts and invariant pathway considerations. The findings underscore the need for robust, holistic fairness approaches in prognostic imaging, beyond transferring insights from diagnostic tasks.

Abstract

As machine learning (ML) algorithms are increasingly used in medical image analysis, concerns have emerged about their potential biases against certain social groups. Although many approaches have been proposed to ensure the fairness of ML models, most existing works focus only on medical image diagnosis tasks, such as image classification and segmentation, and overlooked prognosis scenarios, which involve predicting the likely outcome or progression of a medical condition over time. To address this gap, we introduce FairTTE, the first comprehensive framework for assessing fairness in time-to-event (TTE) prediction in medical imaging. FairTTE encompasses a diverse range of imaging modalities and TTE outcomes, integrating cutting-edge TTE prediction and fairness algorithms to enable systematic and fine-grained analysis of fairness in medical image prognosis. Leveraging causal analysis techniques, FairTTE uncovers and quantifies distinct sources of bias embedded within medical imaging datasets. Our large-scale evaluation reveals that bias is pervasive across different imaging modalities and that current fairness methods offer limited mitigation. We further demonstrate a strong association between underlying bias sources and model disparities, emphasizing the need for holistic approaches that target all forms of bias. Notably, we find that fairness becomes increasingly difficult to maintain under distribution shifts, underscoring the limitations of existing solutions and the pressing need for more robust, equitable prognostic models.

The Boundaries of Fair AI in Medical Image Prognosis: A Causal Perspective

TL;DR

This work tackles fairness in medical-image prognosis by introducing FairTTE, a causal framework for analyzing sources of bias in time-to-event predictions. It presents a unified SCM-based setup, decomposes bias into identifiable components, and benchmarks three TTE models across three public datasets with multiple sensitive attributes. Across extensive experiments on over 20,000 models and five fairness methods, it documents pervasive bias and limited mitigation, with fairness tied to distribution shifts and invariant pathway considerations. The findings underscore the need for robust, holistic fairness approaches in prognostic imaging, beyond transferring insights from diagnostic tasks.

Abstract

As machine learning (ML) algorithms are increasingly used in medical image analysis, concerns have emerged about their potential biases against certain social groups. Although many approaches have been proposed to ensure the fairness of ML models, most existing works focus only on medical image diagnosis tasks, such as image classification and segmentation, and overlooked prognosis scenarios, which involve predicting the likely outcome or progression of a medical condition over time. To address this gap, we introduce FairTTE, the first comprehensive framework for assessing fairness in time-to-event (TTE) prediction in medical imaging. FairTTE encompasses a diverse range of imaging modalities and TTE outcomes, integrating cutting-edge TTE prediction and fairness algorithms to enable systematic and fine-grained analysis of fairness in medical image prognosis. Leveraging causal analysis techniques, FairTTE uncovers and quantifies distinct sources of bias embedded within medical imaging datasets. Our large-scale evaluation reveals that bias is pervasive across different imaging modalities and that current fairness methods offer limited mitigation. We further demonstrate a strong association between underlying bias sources and model disparities, emphasizing the need for holistic approaches that target all forms of bias. Notably, we find that fairness becomes increasingly difficult to maintain under distribution shifts, underscoring the limitations of existing solutions and the pressing need for more robust, equitable prognostic models.

Paper Structure

This paper contains 80 sections, 2 theorems, 31 equations, 33 figures, 5 tables.

Key Result

Theorem 1

Given a performance metric $\mathop{\mathrm{Er}}\nolimits$ satisfying triangle inequality and symmetry properties, i.e., $\left| \mathop{\mathrm{Er}}\nolimits(h,h',D) - \mathop{\mathrm{Er}}\nolimits(h,h",D) \right| \leq \mathop{\mathrm{Er}}\nolimits(h',h",D)$ and $\mathop{\mathrm{Er}}\nolimits(h,h', with

Figures (33)

  • Figure 1: An overview of the FairTTE, a unified framework designed to investigate fairness in TTE prediction for medical image analysis.
  • Figure 2: Causal structure in TTE prediction. Gray circles represent unobserved RVs. (a) Unbiased setting, where the sensitive attribute $A$ affects only $X_A$. (b) Biased setting, where the sensitive attribute $A$ may be correlated (red arrow) with other RVs in causal graph.
  • Figure 3: Causal structure in TTE prediction under distribution shift. We illustrate a scenario where an unfair causal pathways (blue arrows), induced by unobserved RV $U$, are present in train data (a) but absent in test data (b), leading to distribution shift. Bidirectional arrows indicate that the causal direction may vary depending on the specific context. Fair causal pathways (appear in both train and test data) may exists but are omitted for simplicity.
  • Figure 4: Per-group performance ($C^{td}$) of TTE prediction models across various datasets and sensitive attribute combinations. The visualized performances correspond to the best models determined by model selection conducted on validation sets. The $95\%$ confidence intervals (CIs) are calculated using bootstrapping over test sets. Definitions of groups 0 and 1 are provided in Appendix \ref{['sec:f2']}.
  • Figure 5: Quantification of the degree of various sources of bias across all datasets and sensitive attributes. Bias degrees range from 0 to 1, where 0 indicates no bias and 1 represents maximum bias within the datasets.
  • ...and 28 more figures

Theorems & Definitions (5)

  • Theorem 1
  • Proposition 2
  • Definition 3
  • Definition 4
  • Definition 5