Predicting Diabetic Retinopathy Using a Two-Level Ensemble Model
Mahyar Mahmoudi, Tieming Liu
TL;DR
This paper tackles the challenge of predicting diabetic retinopathy (DR) using non-image data from routine lab tests. It introduces a two-level ensemble framework that internally stacks multiple tuned configurations of four base models and then feeds their outputs into a Random Forest meta-learner, achieving strong predictive performance on a Cerner Health Facts dataset balanced with SMOTE. The approach delivers high accuracy (0.9433), ROC-AUC (0.9844), and AUPRC (0.9875), outpacing one-level stacking and FCN baselines while using a reduced feature set (6–7 features) for efficient clinical deployment. The work highlights the value of hierarchical stacking and feature selection in producing interpretable, resource-efficient DR risk predictions with potential applicability to other chronic diseases.
Abstract
Preprint Note: This is the author preprint version of a paper accepted for presentation at the IISE Annual Conference & Expo 2025. The final version will appear in the official proceedings. Diabetic retinopathy (DR) is a leading cause of blindness in working-age adults, and current diagnostic methods rely on resource-intensive eye exams and specialized equipment. Image-based AI tools have shown limitations in early-stage detection, motivating the need for alternative approaches. We propose a non-image-based, two-level ensemble model for DR prediction using routine laboratory test results. In the first stage, base models (Linear SVC, Random Forest, Gradient Boosting, and XGBoost) are hyperparameter tuned and internally stacked across different configurations to optimize metrics such as accuracy, recall, and precision. In the second stage, predictions are aggregated using Random Forest as a meta-learner. This hierarchical stacking strategy improves generalization, balances performance across multiple metrics, and remains computationally efficient compared to deep learning approaches. The model achieved Accuracy 0.9433, F1 Score 0.9425, Recall 0.9207, Precision 0.9653, ROC-AUC 0.9844, and AUPRC 0.9875, surpassing one-level stacking and FCN baselines. These results highlight the model potential for accurate and interpretable DR risk prediction in clinical settings.
