Table of Contents
Fetching ...

Revealing Treatment Non-Adherence Bias in Clinical Machine Learning Using Large Language Models

Zhongyuan Liang, Arvind Suresh, Irene Y. Chen

TL;DR

This study investigates how treatment non-adherence in electronic health records biases clinical machine learning models for hypertension. It leverages a large language model to extract adherence signals from clinical notes in a 3,623-patient cohort, identifying 786 non-adherent individuals and revealing demographic, clinical, and patient-reported reasons for non-adherence. The authors demonstrate that ignoring adherence bias can reverse treatment effects in causal inference and reduce predictive performance by up to 5%, while simultaneously widening fairness disparities; they also show that removing non-adherent data can improve both accuracy and equity. The work illustrates a practical pipeline for incorporating adherence information into real-world clinical ML workflows and highlights the need for responsible, equitable modeling that accounts for treatment adherence bias across diseases.

Abstract

Machine learning systems trained on electronic health records (EHRs) increasingly guide treatment decisions, but their reliability depends on the critical assumption that patients follow the prescribed treatments recorded in EHRs. Using EHR data from 3,623 hypertension patients, we investigate how treatment non-adherence introduces implicit bias that can fundamentally distort both causal inference and predictive modeling. By extracting patient adherence information from clinical notes using a large language model (LLM), we identify 786 patients (21.7%) with medication non-adherence. We further uncover key demographic and clinical factors associated with non-adherence, as well as patient-reported reasons including side effects and difficulties obtaining refills. Our findings demonstrate that this implicit bias can not only reverse estimated treatment effects, but also degrade model performance by up to 5% while disproportionately affecting vulnerable populations by exacerbating disparities in decision outcomes and model error rates. This highlights the importance of accounting for treatment non-adherence in developing responsible and equitable clinical machine learning systems.

Revealing Treatment Non-Adherence Bias in Clinical Machine Learning Using Large Language Models

TL;DR

This study investigates how treatment non-adherence in electronic health records biases clinical machine learning models for hypertension. It leverages a large language model to extract adherence signals from clinical notes in a 3,623-patient cohort, identifying 786 non-adherent individuals and revealing demographic, clinical, and patient-reported reasons for non-adherence. The authors demonstrate that ignoring adherence bias can reverse treatment effects in causal inference and reduce predictive performance by up to 5%, while simultaneously widening fairness disparities; they also show that removing non-adherent data can improve both accuracy and equity. The work illustrates a practical pipeline for incorporating adherence information into real-world clinical ML workflows and highlights the need for responsible, equitable modeling that accounts for treatment adherence bias across diseases.

Abstract

Machine learning systems trained on electronic health records (EHRs) increasingly guide treatment decisions, but their reliability depends on the critical assumption that patients follow the prescribed treatments recorded in EHRs. Using EHR data from 3,623 hypertension patients, we investigate how treatment non-adherence introduces implicit bias that can fundamentally distort both causal inference and predictive modeling. By extracting patient adherence information from clinical notes using a large language model (LLM), we identify 786 patients (21.7%) with medication non-adherence. We further uncover key demographic and clinical factors associated with non-adherence, as well as patient-reported reasons including side effects and difficulties obtaining refills. Our findings demonstrate that this implicit bias can not only reverse estimated treatment effects, but also degrade model performance by up to 5% while disproportionately affecting vulnerable populations by exacerbating disparities in decision outcomes and model error rates. This highlights the importance of accounting for treatment non-adherence in developing responsible and equitable clinical machine learning systems.

Paper Structure

This paper contains 23 sections, 2 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Illustration of cohort selection, LLM non-adherence extraction, and non-adherence analysis. (a) We select 3,623 hypertension patients and pair their visits, with hypertension medication prescribed at the first visit and clinical notes extracted from the second. (b) These notes are then processed by an LLM to identify treatment non-adherence, with outputs validated through clinician annotations. (c) We further perform topic modeling to uncover reasons for non-adherence and assess the harmful impact of ignoring this bias on predictive modeling performance and treatment effect estimation.
  • Figure 2: BERT topic modeling results for treatment non-adherence reasons. Side effects are the dominant reason for non-adherence, and 17.7% of reasons are due to forgetfulness, while others are related to not picking up the medication, needing a refill, or losing it.
  • Figure 3: Results of varying treatment non-adherence data percentage on model performance and fairness. Increasing the proportion of non-adherent data in the training set degrades predictive performance and increases fairness disparities between Black and non-Black patients, as measured by demographic parity and the equal odds criterion (true positive rate and false positive rate differences). Results are averaged over 100 seeds, with error bars representing the standard error of the mean.
  • Figure 5: Cohort Selection
  • Figure 6: Prompt used in the study
  • ...and 3 more figures