Table of Contents
Fetching ...

Detecting clinician implicit biases in diagnoses using proximal causal inference

Kara Liu, Russ Altman, Vasilis Syrgkanis

TL;DR

This work tackles the challenge of measuring clinician implicit bias in diagnostic decisions using large observational health data. It introduces proximal causal inference with health proxies and a partially linear bridge function $q$ to identify the direct bias effect $\theta$ of sociodemographic attributes on diagnosis, leveraging Neyman-orthogonal moments and a residual instrument $V=(\tilde{D}-\gamma^T\tilde{Z})$. Identification hinges on relaxations that make proximal mediation feasible in real-world data, with a proxy-selection algorithm ensuring valid proxies and multiple diagnostic tests assessing instrument strength. Empirical evaluation on semi-synthetic data and UK Biobank demonstrates detectable biases across several $(D,Y)$ pairs, robust performance under weak-instrument conditions, and insights into intersectionality and influential patient subgroups. The approach provides a practical bias-detection tool for data audits and informs efforts to reduce systemic discrimination in healthcare.

Abstract

Clinical decisions to treat and diagnose patients are affected by implicit biases formed by racism, ableism, sexism, and other stereotypes. These biases reflect broader systemic discrimination in healthcare and risk marginalizing already disadvantaged groups. Existing methods for measuring implicit biases require controlled randomized testing and only capture individual attitudes rather than outcomes. However, the "big-data" revolution has led to the availability of large observational medical datasets, like EHRs and biobanks, that provide the opportunity to investigate discrepancies in patient health outcomes. In this work, we propose a causal inference approach to detect the effect of clinician implicit biases on patient outcomes in large-scale medical data. Specifically, our method uses proximal mediation to disentangle pathway-specific effects of a patient's sociodemographic attribute on a clinician's diagnosis decision. We test our method on real-world data from the UK Biobank. Our work can serve as a tool that initiates conversation and brings awareness to unequal health outcomes caused by implicit biases.

Detecting clinician implicit biases in diagnoses using proximal causal inference

TL;DR

This work tackles the challenge of measuring clinician implicit bias in diagnostic decisions using large observational health data. It introduces proximal causal inference with health proxies and a partially linear bridge function to identify the direct bias effect of sociodemographic attributes on diagnosis, leveraging Neyman-orthogonal moments and a residual instrument . Identification hinges on relaxations that make proximal mediation feasible in real-world data, with a proxy-selection algorithm ensuring valid proxies and multiple diagnostic tests assessing instrument strength. Empirical evaluation on semi-synthetic data and UK Biobank demonstrates detectable biases across several pairs, robust performance under weak-instrument conditions, and insights into intersectionality and influential patient subgroups. The approach provides a practical bias-detection tool for data audits and informs efforts to reduce systemic discrimination in healthcare.

Abstract

Clinical decisions to treat and diagnose patients are affected by implicit biases formed by racism, ableism, sexism, and other stereotypes. These biases reflect broader systemic discrimination in healthcare and risk marginalizing already disadvantaged groups. Existing methods for measuring implicit biases require controlled randomized testing and only capture individual attitudes rather than outcomes. However, the "big-data" revolution has led to the availability of large observational medical datasets, like EHRs and biobanks, that provide the opportunity to investigate discrepancies in patient health outcomes. In this work, we propose a causal inference approach to detect the effect of clinician implicit biases on patient outcomes in large-scale medical data. Specifically, our method uses proximal mediation to disentangle pathway-specific effects of a patient's sociodemographic attribute on a clinician's diagnosis decision. We test our method on real-world data from the UK Biobank. Our work can serve as a tool that initiates conversation and brings awareness to unequal health outcomes caused by implicit biases.

Paper Structure

This paper contains 37 sections, 15 theorems, 180 equations, 11 figures, 14 tables, 3 algorithms.

Key Result

Theorem 2.1

We present more intuitive interpretations of each theorem and lemma in the Appendix. Consider a non-parametric structural causal model (SCM) that respects the causal relationships encoded in Figure fig:cg (see Appendix appendix:cg_ass) and assume there exists a "bridge function" $q$ that solves $\ma and the controlled direct effect can be identified as $\theta = \mathbb{E}[q(1, X, W) - q(0, X, W)]

Figures (11)

  • Figure 1: Assumed causal graph.
  • Figure 2: Comparing effect estimates for six $(D,Y)$ pairs using all data with: A) weak instrument and influence set removal (where the numbers next to the yellow bar reflect the set size of high-influence points); B) bootstrapped subsampling 50% of the data at different stages of re-estimation.
  • Figure 3: A) Interpretability into high influence points. B) Income stratification
  • Figure F1: Potential causal graph causing the violations of the primal and the dual. The violation of the primal is caused by some $Z' \rightarrow M_Z \rightarrow Y$, and the violation of the dual is caused by some $X'$ such that $D \rightarrow M_X \rightarrow X'$
  • Figure F2: Correlation clustermap comparing proxy $X$ features selected by the proxy removal algorithm across $(D,Y)$ pairs that (1) yielded admissible $\{(\mathcal{X}^{(i)}, \mathcal{Z}^{(i)})\}_i$ candidates and (2) whose median point estimate $\mid \theta \mid > 0.05$. COPD = Chronic obstructive pulmonary disease. p.s. = post-secondary.
  • ...and 6 more figures

Theorems & Definitions (15)

  • Theorem 2.1: Identification
  • Lemma 1: Identification under partial linearity
  • Theorem 2.2
  • Lemma D.1
  • Lemma D.2
  • Lemma D.3
  • Lemma D.4
  • Lemma D.5
  • Corollary D.1
  • Theorem D.1
  • ...and 5 more