Mutual Information Assisted Ensemble Recommender System for Identifying Critical Risk Factors in Healthcare Prognosis
Abhishek Dey, Debayan Goswami, Rahul Roy, Susmita Ghosh, Yu Shrike Zhang, Jonathan H. Chan
TL;DR
The paper tackles information overload in healthcare data by introducing a mutual information–based ensemble feature recommender to identify clinically relevant risk factors for prognosis. It combines eight feature-selection methods into a positional table and uses MI-based ranking to assemble a robust ranked feature set, demonstrated on four disease datasets including ccRCC. The approach yields superior prognostic factor identification and high classification accuracy (e.g., 96.6% SVM and 98.6% NN for ccRCC) with a reduced feature subset, outperforming several state-of-the-art methods and showing statistical significance. This work advances decision-support in healthcare prognosis and lays groundwork for extending to medical imaging and ROI-focused analyses.
Abstract
Purpose: Health recommenders act as important decision support systems, aiding patients and medical professionals in taking actions that lead to patients' well-being. These systems extract the information which may be of particular relevance to the end-user, helping them in making appropriate decisions. The present study proposes a feature recommender, as a part of a disease management system, that identifies and recommends the most important risk factors for an illness. Methods: A novel mutual information and ensemble-based feature ranking approach for identifying critical risk factors in healthcare prognosis is proposed. Results: To establish the effectiveness of the proposed method, experiments have been conducted on four benchmark datasets of diverse diseases (clear cell renal cell carcinoma (ccRCC), chronic kidney disease, Indian liver patient, and cervical cancer risk factors). The performance of the proposed recommender is compared with four state-of-the-art methods using recommender systems' performance metrics like average precision@K, precision@K, recall@K, F1@K, reciprocal rank@K. The method is able to recommend all relevant critical risk factors for ccRCC. It also attains a higher accuracy (96.6% and 98.6% using support vector machine and neural network, respectively) for ccRCC staging with a reduced feature set as compared to existing methods. Moreover, the top two features recommended using the proposed method with ccRCC, viz. size of tumor and metastasis status, are medically validated from the existing TNM system. Results are also found to be superior for the other three datasets. Conclusion: The proposed recommender can identify and recommend risk factors that have the most discriminating power for detecting diseases.
