Table of Contents
Fetching ...

Performance Analysis of Machine Learning Algorithms in Chronic Kidney Disease Prediction

Iftekhar Ahmed, Tanzil Ebad Chowdhury, Biggo Bushon Routh, Nafisa Tasmiya, Shadman Sakib, Adil Ahmed Chowdhury

TL;DR

This paper tackles chronic kidney disease (CKD) prediction using machine learning by evaluating eight supervised classifiers on a CKD dataset from the UCI repository. After data preprocessing with missing-value imputation and feature encoding, models are trained and assessed using metrics such as MAE, RMSE, Precision, Recall, F1, along with ROC analyses. The key finding is that Random Forest and Logistic Regression achieve about 99% accuracy, while KNN performs poorly at 73%, with other classifiers achieving roughly 95–97% accuracy. The work demonstrates the potential of CAD systems for CKD risk assessment, highlighting promising classifiers for clinical deployment while noting the need for broader validation across diverse datasets.

Abstract

Kidneys are the filter of the human body. About 10% of the global population is thought to be affected by Chronic Kidney Disease (CKD), which causes kidney function to decline. To protect in danger patients from additional kidney damage, effective risk evaluation of CKD and appropriate CKD monitoring are crucial. Due to quick and precise detection capabilities, Machine Learning models can help practitioners accomplish this goal efficiently; therefore, an enormous number of diagnosis systems and processes in the healthcare sector nowadays are relying on machine learning due to its disease prediction capability. In this study, we designed and suggested disease predictive computer-aided designs for the diagnosis of CKD. The dataset for CKD is attained from the repository of machine learning of UCL, with a few missing values; those are filled in using "mean-mode" and "Random sampling method" strategies. After successfully achieving the missing data, eight ML techniques (Random Forest, SVM, Naive Bayes, Logistic Regression, KNN, XGBoost, Decision Tree, and AdaBoost) were used to establish models, and the performance evaluation comparisons among the result accuracies are measured by the techniques to find the machine learning models with the highest accuracy. Among them, Random Forest as well as Logistic Regression showed an outstanding 99% accuracy, followed by the Ada Boost, XGBoost, Naive Bayes, Decision Tree, and SVM, whereas the KNN classifier model stands last with an accuracy of 73%.

Performance Analysis of Machine Learning Algorithms in Chronic Kidney Disease Prediction

TL;DR

This paper tackles chronic kidney disease (CKD) prediction using machine learning by evaluating eight supervised classifiers on a CKD dataset from the UCI repository. After data preprocessing with missing-value imputation and feature encoding, models are trained and assessed using metrics such as MAE, RMSE, Precision, Recall, F1, along with ROC analyses. The key finding is that Random Forest and Logistic Regression achieve about 99% accuracy, while KNN performs poorly at 73%, with other classifiers achieving roughly 95–97% accuracy. The work demonstrates the potential of CAD systems for CKD risk assessment, highlighting promising classifiers for clinical deployment while noting the need for broader validation across diverse datasets.

Abstract

Kidneys are the filter of the human body. About 10% of the global population is thought to be affected by Chronic Kidney Disease (CKD), which causes kidney function to decline. To protect in danger patients from additional kidney damage, effective risk evaluation of CKD and appropriate CKD monitoring are crucial. Due to quick and precise detection capabilities, Machine Learning models can help practitioners accomplish this goal efficiently; therefore, an enormous number of diagnosis systems and processes in the healthcare sector nowadays are relying on machine learning due to its disease prediction capability. In this study, we designed and suggested disease predictive computer-aided designs for the diagnosis of CKD. The dataset for CKD is attained from the repository of machine learning of UCL, with a few missing values; those are filled in using "mean-mode" and "Random sampling method" strategies. After successfully achieving the missing data, eight ML techniques (Random Forest, SVM, Naive Bayes, Logistic Regression, KNN, XGBoost, Decision Tree, and AdaBoost) were used to establish models, and the performance evaluation comparisons among the result accuracies are measured by the techniques to find the machine learning models with the highest accuracy. Among them, Random Forest as well as Logistic Regression showed an outstanding 99% accuracy, followed by the Ada Boost, XGBoost, Naive Bayes, Decision Tree, and SVM, whereas the KNN classifier model stands last with an accuracy of 73%.

Paper Structure

This paper contains 20 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: Proposed Workflow for Chronic Kidney Disease Prediction
  • Figure 2: Overview of the chronic kidney disease (CKD) dataset
  • Figure 3: Numerical feature distribution of CKD dataset
  • Figure 4: ROC curve of the ML classifiers
  • Figure 5: Confusion matrices of ML classifiers in CKD prediction
  • ...and 2 more figures