Table of Contents
Fetching ...

Predicting All-Cause Hospital Readmissions from Medical Claims Data of Hospitalised Patients

Avinash Kadimisetty, Arun Rajagopalan, Vijendra SK

TL;DR

Predicting all-cause hospital readmissions using health insurance claims data, this work compares Logistic Regression, Random Forest, and Support Vector Machines with PCA-based dimensionality reduction to identify high-risk patients within $30$ days of discharge. The study constructs episode-level predictors from demographics, comorbidities, LOS, medications, prior admissions, ED visits, admitting diagnoses, and procedures, evaluating models with an 80/20 train/test split and AUC as the primary metric. Random Forest emerges as the best-performing approach on the test set (around $0.67$ AUC), indicating strong potential for informing targeted interventions to reduce costly readmissions. The authors highlight implications for healthcare quality and cost containment and propose future work on condition-specific models and leveraging pre-index/post-index data to deepen predictive insights.

Abstract

Reducing preventable hospital readmissions is a national priority for payers, providers, and policymakers seeking to improve health care and lower costs. The rate of readmission is being used as a benchmark to determine the quality of healthcare provided by the hospitals. In thisproject, we have used machine learning techniques like Logistic Regression, Random Forest and Support Vector Machines to analyze the health claims data and identify demographic and medical factors that play a crucial role in predicting all-cause readmissions. As the health claims data is high dimensional, we have used Principal Component Analysis as a dimension reduction technique and used the results for building regression models. We compared and evaluated these models based on the Area Under Curve (AUC) metric. Random Forest model gave the highest performance followed by Logistic Regression and Support Vector Machine models. These models can be used to identify the crucial factors causing readmissions and help identify patients to focus on to reduce the chances of readmission, ultimately bringing down the cost and increasing the quality of healthcare provided to the patients.

Predicting All-Cause Hospital Readmissions from Medical Claims Data of Hospitalised Patients

TL;DR

Predicting all-cause hospital readmissions using health insurance claims data, this work compares Logistic Regression, Random Forest, and Support Vector Machines with PCA-based dimensionality reduction to identify high-risk patients within days of discharge. The study constructs episode-level predictors from demographics, comorbidities, LOS, medications, prior admissions, ED visits, admitting diagnoses, and procedures, evaluating models with an 80/20 train/test split and AUC as the primary metric. Random Forest emerges as the best-performing approach on the test set (around AUC), indicating strong potential for informing targeted interventions to reduce costly readmissions. The authors highlight implications for healthcare quality and cost containment and propose future work on condition-specific models and leveraging pre-index/post-index data to deepen predictive insights.

Abstract

Reducing preventable hospital readmissions is a national priority for payers, providers, and policymakers seeking to improve health care and lower costs. The rate of readmission is being used as a benchmark to determine the quality of healthcare provided by the hospitals. In thisproject, we have used machine learning techniques like Logistic Regression, Random Forest and Support Vector Machines to analyze the health claims data and identify demographic and medical factors that play a crucial role in predicting all-cause readmissions. As the health claims data is high dimensional, we have used Principal Component Analysis as a dimension reduction technique and used the results for building regression models. We compared and evaluated these models based on the Area Under Curve (AUC) metric. Random Forest model gave the highest performance followed by Logistic Regression and Support Vector Machine models. These models can be used to identify the crucial factors causing readmissions and help identify patients to focus on to reduce the chances of readmission, ultimately bringing down the cost and increasing the quality of healthcare provided to the patients.

Paper Structure

This paper contains 21 sections, 3 figures, 16 tables.

Figures (3)

  • Figure 1: Train ROC Curve for Random Forest
  • Figure 2: Test ROC Curve for Random Forest
  • Figure 3: Important Features from Random Forest