Contrasting Prediction Methods for Early Warning Systems at Undergraduate Level
Emma Howard, Maria Meehan, Andrew Parnell
TL;DR
This paper addresses the challenge of early identification of at-risk students in a large undergraduate STEM course by comparing eight prediction methods using fine-grained LMS data and cluster membership to predict final grades. It frames the problem as detecting an optimal midsemester timing (around weeks 5–6) that balances intervention impact with predictive accuracy, and demonstrates that Bayesian Additive Regression Trees (BART) with cluster-informed features achieve strong performance, reaching a mean absolute error of about 6.5 percentage points by week 6. The study highlights the primacy of continuous assessment as a predictor, shows that clustering can reveal distinct engagement patterns, and provides an openly accessible R-based code repository for reproducibility. The findings have practical implications for scalable, targeted interventions in large online STEM courses and contribute methodological advances by combining fine-grained analytics, model-based clustering, and nonparametric tree ensembles. Overall, the work offers a data-driven blueprint for deploying effective early warning systems in higher education while acknowledging course-specific limitations."
Abstract
In this study, we investigate prediction methods for an early warning system for a large STEM undergraduate course. Recent studies have provided evidence in favour of adopting early warning systems as a means of identifying at-risk students. Many of these early warning systems rely on data from students' engagement with Learning Management Systems (LMSs). Our study examines eight prediction methods, and investigates the optimal time in a course to apply an early warning system. We present findings from a statistics university course which has a large proportion of resources on the LMS Blackboard and weekly continuous assessment. We identify weeks 5-6 of our course (half way through the semester) as an optimal time to implement an early warning system, as it allows time for the students to make changes to their study patterns whilst retaining reasonable prediction accuracy. Using detailed (fine-grained) variables, clustering and our final prediction method of BART (Bayesian Additive Regressive Trees) we are able to predict students' final grade by week 6 based on mean absolute error (MAE) to 6.5 percentage points. We provide our R code for implementation of the prediction methods used in a GitHub repository.
