Table of Contents
Fetching ...

Predicting Emergency Department Visits for Patients with Type II Diabetes

Javad M Alizadeh, Jay S Patel, Gabriel Tajeu, Yuzhou Chen, Ilene L Hollin, Mukesh K Patel, Junchao Fei, Huanmei Wu

TL;DR

The paper addresses predicting emergency department visits among adults with Type II diabetes by integrating electronic health records with social determinants of health. It compares multiple machine learning models using tenfold cross-validation on a large HSX dataset and a curated 87-feature set, achieving AUC up to approximately 0.82. Key predictors include age, visit-interval metrics, certain ICD-10 diagnoses, and SDoH indicators like income and education, as well as vital signs, highlighting multifactorial drivers of ED utilization. The findings support risk stratification and resource planning in ED settings and outline a practical deployment framework with monitoring, feedback, and policy considerations.

Abstract

Over 30 million Americans are affected by Type II diabetes (T2D), a treatable condition with significant health risks. This study aims to develop and validate predictive models using machine learning (ML) techniques to estimate emergency department (ED) visits among patients with T2D. Data for these patients was obtained from the HealthShare Exchange (HSX), focusing on demographic details, diagnoses, and vital signs. Our sample contained 34,151 patients diagnosed with T2D which resulted in 703,065 visits overall between 2017 and 2021. A workflow integrated EMR data with SDoH for ML predictions. A total of 87 out of 2,555 features were selected for model construction. Various machine learning algorithms, including CatBoost, Ensemble Learning, K-nearest Neighbors (KNN), Support Vector Classification (SVC), Random Forest, and Extreme Gradient Boosting (XGBoost), were employed with tenfold cross-validation to predict whether a patient is at risk of an ED visit. The ROC curves for Random Forest, XGBoost, Ensemble Learning, CatBoost, KNN, and SVC, were 0.82, 0.82, 0.82, 0.81, 0.72, 0.68, respectively. Ensemble Learning and Random Forest models demonstrated superior predictive performance in terms of discrimination, calibration, and clinical applicability. These models are reliable tools for predicting risk of ED visits among patients with T2D. They can estimate future ED demand and assist clinicians in identifying critical factors associated with ED utilization, enabling early interventions to reduce such visits. The top five important features were age, the difference between visitation gaps, visitation gaps, R10 or abdominal and pelvic pain, and the Index of Concentration at the Extremes (ICE) for income.

Predicting Emergency Department Visits for Patients with Type II Diabetes

TL;DR

The paper addresses predicting emergency department visits among adults with Type II diabetes by integrating electronic health records with social determinants of health. It compares multiple machine learning models using tenfold cross-validation on a large HSX dataset and a curated 87-feature set, achieving AUC up to approximately 0.82. Key predictors include age, visit-interval metrics, certain ICD-10 diagnoses, and SDoH indicators like income and education, as well as vital signs, highlighting multifactorial drivers of ED utilization. The findings support risk stratification and resource planning in ED settings and outline a practical deployment framework with monitoring, feedback, and policy considerations.

Abstract

Over 30 million Americans are affected by Type II diabetes (T2D), a treatable condition with significant health risks. This study aims to develop and validate predictive models using machine learning (ML) techniques to estimate emergency department (ED) visits among patients with T2D. Data for these patients was obtained from the HealthShare Exchange (HSX), focusing on demographic details, diagnoses, and vital signs. Our sample contained 34,151 patients diagnosed with T2D which resulted in 703,065 visits overall between 2017 and 2021. A workflow integrated EMR data with SDoH for ML predictions. A total of 87 out of 2,555 features were selected for model construction. Various machine learning algorithms, including CatBoost, Ensemble Learning, K-nearest Neighbors (KNN), Support Vector Classification (SVC), Random Forest, and Extreme Gradient Boosting (XGBoost), were employed with tenfold cross-validation to predict whether a patient is at risk of an ED visit. The ROC curves for Random Forest, XGBoost, Ensemble Learning, CatBoost, KNN, and SVC, were 0.82, 0.82, 0.82, 0.81, 0.72, 0.68, respectively. Ensemble Learning and Random Forest models demonstrated superior predictive performance in terms of discrimination, calibration, and clinical applicability. These models are reliable tools for predicting risk of ED visits among patients with T2D. They can estimate future ED demand and assist clinicians in identifying critical factors associated with ED utilization, enabling early interventions to reduce such visits. The top five important features were age, the difference between visitation gaps, visitation gaps, R10 or abdominal and pelvic pain, and the Index of Concentration at the Extremes (ICE) for income.

Paper Structure

This paper contains 12 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The overview of our project workflow.
  • Figure 2: Percentage of T2D over ZIP code population.
  • Figure 3: Age distributions of patients with T2D from Philadelphia among different gender and race groups. The percentage shows the proportion of a gender/race to the total population of a race group. Not all groups are shown for better illustration.
  • Figure 4: Exploratory visit statistics results of patients with T2D, the violin plots for gaps between ED and non-ED visits, (a) without outliers, and (b) with outliers
  • Figure 5: The Receiver Operator Characteristic (ROC) curves of the predictive models and their corresponding evaluation metrics.
  • ...and 1 more figures