Dataset Optimization for Chronic Disease Prediction with Bio-Inspired Feature Selection
Abeer Dyoub, Ivan Letteri
TL;DR
This work tackles high-dimensional medical data for chronic disease prediction by applying three bio-inspired feature selection methods—Genetic Algorithm, Particle Swarm Optimization, and Whale Optimization Algorithm. Using a KNN-based fitness evaluator and a single-objective emphasis on accuracy (α ≈ 0.99), the authors assess feature-subset performance across five datasets (diabetes, Pima Indian, breast cancer, kidney disease, heart failure) and multiple classifiers. The results show that bio-inspired FS can substantially reduce feature counts while maintaining or improving predictive accuracy, with GA most often delivering the strongest gains and notable training-time reductions in several cases. These findings support the potential of FS-driven, interpretable, and more efficient predictive analytics to aid early intervention and precision medicine in chronic-disease care, while highlighting dataset-dependent variability and the need for multi-objective optimization in future work.
Abstract
In this study, we investigated the application of bio-inspired optimization algorithms, including Genetic Algorithm, Particle Swarm Optimization, and Whale Optimization Algorithm, for feature selection in chronic disease prediction. The primary goal was to enhance the predictive accuracy of models streamline data dimensionality, and make predictions more interpretable and actionable. The research encompassed a comparative analysis of the three bio-inspired feature selection approaches across diverse chronic diseases, including diabetes, cancer, kidney, and cardiovascular diseases. Performance metrics such as accuracy, precision, recall, and f1 score are used to assess the effectiveness of the algorithms in reducing the number of features needed for accurate classification. The results in general demonstrate that the bio-inspired optimization algorithms are effective in reducing the number of features required for accurate classification. However, there have been variations in the performance of the algorithms on different datasets. The study highlights the importance of data pre-processing and cleaning in ensuring the reliability and effectiveness of the analysis. This study contributes to the advancement of predictive analytics in the realm of chronic diseases. The potential impact of this work extends to early intervention, precision medicine, and improved patient outcomes, providing new avenues for the delivery of healthcare services tailored to individual needs. The findings underscore the potential benefits of using bio-inspired optimization algorithms for feature selection in chronic disease prediction, offering valuable insights for improving healthcare outcomes.
