Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness
Vincent Jeanselme, Maria De-Arteaga, Zhe Zhang, Jessica Barrett, Brian Tom
TL;DR
This work investigates how clinical missingness patterns influence algorithmic fairness in healthcare ML. It shows that common practices like group-specific imputation can worsen reconstruction quality or fairness gaps under realistic missingness mechanisms, and that minimizing reconstruction error does not guarantee better downstream fairness. The authors propose an empirical framework to guide imputation choices and an Imputation Cards reporting standard to communicate missing-data handling and its implications. Through synthetic experiments and real-world case studies on MIMIC III and SUPPORT, they demonstrate that imputation strategy selection should be task- and data-dependent, with decisions driven by downstream predictive performance and fairness rather than reconstruction accuracy alone. The work provides practical tools and guidance for regulators and practitioners to foster transparent, fair deployment of ML in real-world clinical settings.
Abstract
Machine learning risks reinforcing biases present in data and, as we argue in this work, in what is absent from data. In healthcare, societal and decision biases shape patterns in missing data, yet the algorithmic fairness implications of group-specific missingness are poorly understood. The way we address missingness in healthcare can have detrimental impacts on downstream algorithmic fairness. Our work questions current recommendations and practices aimed at handling missing data with a focus on their effect on algorithmic fairness, and offers a path forward. Specifically, we consider the theoretical underpinnings of existing recommendations as well as their empirical predictive performance and corresponding algorithmic fairness measured through subgroup performances. Our results show that current practices for handling missingness lack principled foundations, are disconnected from the realities of missingness mechanisms in healthcare, and can be counterproductive. For example, we show that favouring group-specific imputation strategy can be misguided and exacerbate prediction disparities. We then build on our findings to propose a framework for empirically guiding imputation choices, and an accompanying reporting framework. Our work constitutes an important contribution to recent efforts by regulators and practitioners to grapple with the realities of real-world data, and to foster the responsible and transparent deployment of machine learning systems. We demonstrate the practical utility of the proposed framework through experimentation on widely used datasets, where we show how the proposed framework can guide the selection of imputation strategies, allowing us to choose among strategies that yield equal overall predictive performance but present different algorithmic fairness properties.
