NAWOA-XGBoost: A Novel Model for Early Prediction of Academic Potential in Computer Science Students
Junhao Wei, Yanzhao Gu, Ran Zhang, Mingjing Huang, Jinhong Song, Yanxiao Li, Wenxuan Zhu, Yapeng Wang, Zikun Li, Zhiwen Wang, Xu Yang, Ngai Cheong
TL;DR
The paper tackles early prediction of computer science students' academic potential by enhancing hyperparameter optimization for XGBoost through a novel Nonlinear Adaptive Whale Optimization Algorithm (NAWOA). NAWOA integrates multiple strategies—Good Nodes Set initialization, Leader-Followers Foraging, Dynamic Encircling Prey, Triangular Hunting, and a nonlinear convergence factor—to improve exploration, exploitation, and convergence stability. It validates NAWOA on 23 benchmark functions and a real MPU CS dataset, showing that NAWOA-XGBoost significantly outperforms XGBoost and WOA-XGBoost across accuracy, macro F1, AUC, and G-Mean, especially on multi-class imbalanced data. These results highlight the approach's practical potential for educational assessment and personalized intervention in higher education.
Abstract
Whale Optimization Algorithm (WOA) suffers from limited global search ability, slow convergence, and tendency to fall into local optima, restricting its effectiveness in hyperparameter optimization for machine learning models. To address these issues, this study proposes a Nonlinear Adaptive Whale Optimization Algorithm (NAWOA), which integrates strategies such as Good Nodes Set initialization, Leader-Followers Foraging, Dynamic Encircling Prey, Triangular Hunting, and a nonlinear convergence factor to enhance exploration, exploitation, and convergence stability. Experiments on 23 benchmark functions demonstrate NAWOA's superior optimization capability and robustness. Based on this optimizer, an NAWOA-XGBoost model was developed to predict academic potential using data from 495 Computer Science undergraduates at Macao Polytechnic University (2009-2019). Results show that NAWOA-XGBoost outperforms traditional XGBoost and WOA-XGBoost across key metrics, including Accuracy (0.8148), Macro F1 (0.8101), AUC (0.8932), and G-Mean (0.8172), demonstrating strong adaptability on multi-class imbalanced datasets.
