Table of Contents
Fetching ...

Equitable Length of Stay Prediction for Patients with Learning Disabilities and Multiple Long-term Conditions Using Machine Learning

Emeka Abakasanga, Rania Kousovista, Georgina Cosma, Ashley Akbari, Francesco Zaccardi, Navjot Kaur, Danielle Fitt, Gyuchan Thomas Jun, Reza Kiani, Satheesh Gangadharan

TL;DR

The potential of applying machine learning models with effective bias mitigation approaches on EHR data sources to enable equitable prediction of hospital stays by addressing data imbalances across groups is demonstrated.

Abstract

People with learning disabilities have a higher mortality rate and premature deaths compared to the general public, as reported in published research in the UK and other countries. This study analyses hospitalisations of 9,618 patients identified with learning disabilities and long-term conditions for the population of Wales using electronic health record (EHR) data sources from the SAIL Databank. We describe the demographic characteristics, prevalence of long-term conditions, medication history, hospital visits, and lifestyle history for our study cohort, and apply machine learning models to predict the length of hospital stays for this cohort. The random forest (RF) model achieved an Area Under the Curve (AUC) of 0.759 (males) and 0.756 (females), a false negative rate of 0.224 (males) and 0.229 (females), and a balanced accuracy of 0.690 (males) and 0.689 (females). After examining model performance across ethnic groups, two bias mitigation algorithms (threshold optimization and the reductions algorithm using an exponentiated gradient) were applied to minimise performance discrepancies. The threshold optimizer algorithm outperformed the reductions algorithm, achieving lower ranges in false positive rate and balanced accuracy for the male cohort across the ethnic groups. This study demonstrates the potential of applying machine learning models with effective bias mitigation approaches on EHR data sources to enable equitable prediction of hospital stays by addressing data imbalances across groups.

Equitable Length of Stay Prediction for Patients with Learning Disabilities and Multiple Long-term Conditions Using Machine Learning

TL;DR

The potential of applying machine learning models with effective bias mitigation approaches on EHR data sources to enable equitable prediction of hospital stays by addressing data imbalances across groups is demonstrated.

Abstract

People with learning disabilities have a higher mortality rate and premature deaths compared to the general public, as reported in published research in the UK and other countries. This study analyses hospitalisations of 9,618 patients identified with learning disabilities and long-term conditions for the population of Wales using electronic health record (EHR) data sources from the SAIL Databank. We describe the demographic characteristics, prevalence of long-term conditions, medication history, hospital visits, and lifestyle history for our study cohort, and apply machine learning models to predict the length of hospital stays for this cohort. The random forest (RF) model achieved an Area Under the Curve (AUC) of 0.759 (males) and 0.756 (females), a false negative rate of 0.224 (males) and 0.229 (females), and a balanced accuracy of 0.690 (males) and 0.689 (females). After examining model performance across ethnic groups, two bias mitigation algorithms (threshold optimization and the reductions algorithm using an exponentiated gradient) were applied to minimise performance discrepancies. The threshold optimizer algorithm outperformed the reductions algorithm, achieving lower ranges in false positive rate and balanced accuracy for the male cohort across the ethnic groups. This study demonstrates the potential of applying machine learning models with effective bias mitigation approaches on EHR data sources to enable equitable prediction of hospital stays by addressing data imbalances across groups.

Paper Structure

This paper contains 27 sections, 4 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Consort Flow Diagram.
  • Figure 2: The top five primary conditions and top five prevalent (common) conditions treated or investigated during hospitalisations for the (a) males and (b) females with ld and mltc. The primary conditions indicate the main condition treated or investigated during the admission, and the prevalent conditions indicate the most frequently treated or investigated conditions (includes both primary and secondary diagnostic codes) across all unique hospitalisations of patients with ld and mltc.
  • Figure 3: Distribution of patients with los $\ge$ 129 days. (a) The $4$ most common conditions and (b) age distribution of patients with and without mental illness.
  • Figure 4: Distribution of mean mltc count across age groups for (a) combined sexes, and (b) individual sexes.
  • Figure 5: roc curves across models indicating their optimal points (fpr and tpr) for (a) male and (b) female cohort.
  • ...and 9 more figures