Explainable Machine Learning System for Predicting Chronic Kidney Disease in High-Risk Cardiovascular Patients

Nantika Nguycharoen

Explainable Machine Learning System for Predicting Chronic Kidney Disease in High-Risk Cardiovascular Patients

Nantika Nguycharoen

TL;DR

This work presents an explainable CKD prediction system tailored to high-risk cardiovascular patients, combining retrospective UAE data with SMOTE-NC to address class imbalance. The Random Forest model achieves a sensitivity of 0.882, prioritizing reduction of false negatives for screening purposes, while an integrated explainability framework delivers global and local interpretations, bias and safety assessments, and biomedical relevance through anchored rules. The global SHAP analysis identifies key predictors such as DM_medication usage, $eGFR$, ACEI_ARB usage, diabetes status, and HbA1C; local interpretations use prototypes and counterfactuals via What-If Tool to explain individual predictions. The study also conducts bias and safety evaluations, finding no significant gender bias but some dependence on initial $eGFR$, and demonstrates safety in edge-case testing. Although limited by dataset size and regional scope, the framework advances practical, interpretable, and regulatory-ready healthcare AI with potential for broader adoption in CKD screening and beyond.

Abstract

As the global population ages, the incidence of Chronic Kidney Disease (CKD) is rising. CKD often remains asymptomatic until advanced stages, which significantly burdens both the healthcare system and patient quality of life. This research developed an explainable machine learning system for predicting CKD in patients with cardiovascular risks, utilizing medical history and laboratory data. The Random Forest model achieved the highest sensitivity of 88.2%. The study introduces a comprehensive explainability framework that extends beyond traditional feature importance methods, incorporating global and local interpretations, bias inspection, biomedical relevance, and safety assessments. Key predictive features identified in global interpretation were the use of diabetic and ACEI/ARB medications, and initial eGFR values. Local interpretation provided model insights through counterfactual explanations, which aligned with other system parts. After conducting a bias inspection, it was found that the initial eGFR values and CKD predictions exhibited some bias, but no significant gender bias was identified. The model's logic, extracted by scoped rules, was confirmed to align with existing medical literature. The safety assessment tested potentially dangerous cases and confirmed that the model behaved safely. This system enhances the explainability, reliability, and accountability of the model, promoting its potential integration into healthcare settings and compliance with upcoming regulatory standards, and showing promise for broader applications in healthcare machine learning.

Explainable Machine Learning System for Predicting Chronic Kidney Disease in High-Risk Cardiovascular Patients

TL;DR

, ACEI_ARB usage, diabetes status, and HbA1C; local interpretations use prototypes and counterfactuals via What-If Tool to explain individual predictions. The study also conducts bias and safety evaluations, finding no significant gender bias but some dependence on initial

, and demonstrates safety in edge-case testing. Although limited by dataset size and regional scope, the framework advances practical, interpretable, and regulatory-ready healthcare AI with potential for broader adoption in CKD screening and beyond.

Abstract

Paper Structure (17 sections, 5 figures, 3 tables)

This paper contains 17 sections, 5 figures, 3 tables.

Introduction
Methods
Machine Learning Model
Explainable System
Global interpretation
Local Interpretation
Bias Inspection
Biomedical Relevance
Safety Assessment
Results
Global Interpretation
Local Interpretation
Bias Inspection
Biomedical Relevance
Safety Assessment
...and 2 more sections

Figures (5)

Figure 1: The explainable system
Figure 2: The SHAP summary plot for the random forest model
Figure 3: The SHAP summary plot shows the distribution of SHAP values for predictions of CKD.
Figure 4: The partial dependence plot of gender on the target
Figure 5: The partial dependence plot of eGFR on the target

Explainable Machine Learning System for Predicting Chronic Kidney Disease in High-Risk Cardiovascular Patients

TL;DR

Abstract

Explainable Machine Learning System for Predicting Chronic Kidney Disease in High-Risk Cardiovascular Patients

Authors

TL;DR

Abstract

Table of Contents

Figures (5)