Table of Contents
Fetching ...

Survival Meets Classification: A Novel Framework for Early Risk Prediction Models of Chronic Diseases

Shaheer Ahmad Khan, Muhammad Usamah Shahid, Muddassar Farooq

Abstract

Chronic diseases are long-lasting conditions that require lifelong medical attention. Using big EMR data, we have developed early disease risk prediction models for five common chronic diseases: diabetes, hypertension, CKD, COPD, and chronic ischemic heart disease. In this study, we present a novel approach for disease risk models by integrating survival analysis with classification techniques. Traditional models for predicting the risk of chronic diseases predominantly focus on either survival analysis or classification independently. In this paper, we show survival analysis methods can be re-engineered to enable them to do classification efficiently and effectively, thereby making them a comprehensive tool for developing disease risk surveillance models. The results of our experiments on real-world big EMR data show that the performance of survival models in terms of accuracy, F1 score, and AUROC is comparable to or better than that of prior state-of-the-art models like LightGBM and XGBoost. Lastly, the proposed survival models use a novel methodology to generate explanations, which have been clinically validated by a panel of three expert physicians.

Survival Meets Classification: A Novel Framework for Early Risk Prediction Models of Chronic Diseases

Abstract

Chronic diseases are long-lasting conditions that require lifelong medical attention. Using big EMR data, we have developed early disease risk prediction models for five common chronic diseases: diabetes, hypertension, CKD, COPD, and chronic ischemic heart disease. In this study, we present a novel approach for disease risk models by integrating survival analysis with classification techniques. Traditional models for predicting the risk of chronic diseases predominantly focus on either survival analysis or classification independently. In this paper, we show survival analysis methods can be re-engineered to enable them to do classification efficiently and effectively, thereby making them a comprehensive tool for developing disease risk surveillance models. The results of our experiments on real-world big EMR data show that the performance of survival models in terms of accuracy, F1 score, and AUROC is comparable to or better than that of prior state-of-the-art models like LightGBM and XGBoost. Lastly, the proposed survival models use a novel methodology to generate explanations, which have been clinically validated by a panel of three expert physicians.
Paper Structure (19 sections, 7 figures, 12 tables)

This paper contains 19 sections, 7 figures, 12 tables.

Figures (7)

  • Figure 1: Distribution of observation times for hypertension using the three described approaches. Similar plots for the remaining diseases can be found in Appendix \ref{['ap: time_distributions']}.
  • Figure 2: Average survival curves for the hypertension training set. Clusters are made using the true and predicted labels found using survival probability for classification. Similar curves for the remaining diseases can be found in Appendix \ref{['ap: survival_curves']}.
  • Figure 3: Average F1 scores of LGBM and RSF using the three classification techniques
  • Figure 4: Feature importances found using SurvSHAP and our custom implementation. Similar plots for the remaining diseases can be found in Appendix \ref{['ap: expalanations']}.
  • Figure :
  • ...and 2 more figures