Table of Contents
Fetching ...

Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness

Vincent Jeanselme, Maria De-Arteaga, Zhe Zhang, Jessica Barrett, Brian Tom

TL;DR

This work investigates how clinical missingness patterns influence algorithmic fairness in healthcare ML. It shows that common practices like group-specific imputation can worsen reconstruction quality or fairness gaps under realistic missingness mechanisms, and that minimizing reconstruction error does not guarantee better downstream fairness. The authors propose an empirical framework to guide imputation choices and an Imputation Cards reporting standard to communicate missing-data handling and its implications. Through synthetic experiments and real-world case studies on MIMIC III and SUPPORT, they demonstrate that imputation strategy selection should be task- and data-dependent, with decisions driven by downstream predictive performance and fairness rather than reconstruction accuracy alone. The work provides practical tools and guidance for regulators and practitioners to foster transparent, fair deployment of ML in real-world clinical settings.

Abstract

Machine learning risks reinforcing biases present in data and, as we argue in this work, in what is absent from data. In healthcare, societal and decision biases shape patterns in missing data, yet the algorithmic fairness implications of group-specific missingness are poorly understood. The way we address missingness in healthcare can have detrimental impacts on downstream algorithmic fairness. Our work questions current recommendations and practices aimed at handling missing data with a focus on their effect on algorithmic fairness, and offers a path forward. Specifically, we consider the theoretical underpinnings of existing recommendations as well as their empirical predictive performance and corresponding algorithmic fairness measured through subgroup performances. Our results show that current practices for handling missingness lack principled foundations, are disconnected from the realities of missingness mechanisms in healthcare, and can be counterproductive. For example, we show that favouring group-specific imputation strategy can be misguided and exacerbate prediction disparities. We then build on our findings to propose a framework for empirically guiding imputation choices, and an accompanying reporting framework. Our work constitutes an important contribution to recent efforts by regulators and practitioners to grapple with the realities of real-world data, and to foster the responsible and transparent deployment of machine learning systems. We demonstrate the practical utility of the proposed framework through experimentation on widely used datasets, where we show how the proposed framework can guide the selection of imputation strategies, allowing us to choose among strategies that yield equal overall predictive performance but present different algorithmic fairness properties.

Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness

TL;DR

This work investigates how clinical missingness patterns influence algorithmic fairness in healthcare ML. It shows that common practices like group-specific imputation can worsen reconstruction quality or fairness gaps under realistic missingness mechanisms, and that minimizing reconstruction error does not guarantee better downstream fairness. The authors propose an empirical framework to guide imputation choices and an Imputation Cards reporting standard to communicate missing-data handling and its implications. Through synthetic experiments and real-world case studies on MIMIC III and SUPPORT, they demonstrate that imputation strategy selection should be task- and data-dependent, with decisions driven by downstream predictive performance and fairness rather than reconstruction accuracy alone. The work provides practical tools and guidance for regulators and practitioners to foster transparent, fair deployment of ML in real-world clinical settings.

Abstract

Machine learning risks reinforcing biases present in data and, as we argue in this work, in what is absent from data. In healthcare, societal and decision biases shape patterns in missing data, yet the algorithmic fairness implications of group-specific missingness are poorly understood. The way we address missingness in healthcare can have detrimental impacts on downstream algorithmic fairness. Our work questions current recommendations and practices aimed at handling missing data with a focus on their effect on algorithmic fairness, and offers a path forward. Specifically, we consider the theoretical underpinnings of existing recommendations as well as their empirical predictive performance and corresponding algorithmic fairness measured through subgroup performances. Our results show that current practices for handling missingness lack principled foundations, are disconnected from the realities of missingness mechanisms in healthcare, and can be counterproductive. For example, we show that favouring group-specific imputation strategy can be misguided and exacerbate prediction disparities. We then build on our findings to propose a framework for empirically guiding imputation choices, and an accompanying reporting framework. Our work constitutes an important contribution to recent efforts by regulators and practitioners to grapple with the realities of real-world data, and to foster the responsible and transparent deployment of machine learning systems. We demonstrate the practical utility of the proposed framework through experimentation on widely used datasets, where we show how the proposed framework can guide the selection of imputation strategies, allowing us to choose among strategies that yield equal overall predictive performance but present different algorithmic fairness properties.
Paper Structure (91 sections, 3 theorems, 27 equations, 34 figures, 2 tables)

This paper contains 91 sections, 3 theorems, 27 equations, 34 figures, 2 tables.

Key Result

Lemma 4.1

Assuming i.i.d. data points $\{x_i\}$, one can express the reconstruction error in group $g$ resulting from group mean imputation as: [yshift=-0.8em]belowmissingnessMissingness process [yshift=-0.4em]belowcompStandard deviation [yshift=0.4em]aboveerrorVariance of unobserved data where the missingness process is represented through (i) $\rho_g = \text{Corr}(O, X \mid G = g)$, the unobserved correl

Figures (34)

  • Figure 1: Examples of group-specific clinical presence mechanisms.
  • Figure 2: Graphs associated with the identified clinical missingness scenarios. Full circled covariates are observed, dashed ones are unobserved. $Y$ is the condition, $G$ is the group membership, $X_1$ and $X_2$ are the two covariates. $O_2$ is the decision to observe the associated $X_2$. Red arrows underline the dependency differences across scenarios. Undirected arrows represent problem-specific directed dependencies.
  • Figure 3: Impact of different imputation strategies on algorithmic fairness, given a population marked by group-specific missingness patterns. This paper measures algorithmic fairness at two levels: (i) imputation, i.e., how different imputation strategies impact the quality of the reconstructed data for different groups, (ii) prediction, i.e., how different imputation strategies impact the downstream gap in performance.
  • Figure 4: Graphical summary of clinical missingness in the simulation experiments. Missingness is enforced on $X_2$, affecting 50% of the shaded regions for the indicated group.
  • Figure 5: Impact on reconstruction error: Group-specific reconstruction errors across scenarios on 100 synthetic experiments for each missingness pattern. Lower reconstruction error is better.
  • ...and 29 more figures

Theorems & Definitions (6)

  • Definition 4.1: Reconstruction error
  • Definition 4.2: Equal Performance
  • Lemma 4.1: Group and population mean imputations' reconstruction error
  • Theorem 4.1: Comparison of group and population mean imputations' reconstruction error
  • Theorem 4.2: Comparison of group and population mean imputations' fairness gaps
  • Remark