Towards An Online Incremental Approach to Predict Students Performance
Chahrazed Labba, Anne Boyer
TL;DR
The paper tackles predicting student performance in streaming data by adopting online incremental learning with a memory-constrained, memory-based rehearsal approach. It introduces a genetic algorithm to construct diverse, balanced exemplars from old and new data, addressing the NP-hard exemplar selection problem and avoiding the instability of random sampling. On the OULAD dataset, the GA-based method achieves up to ~10% higher accuracy and substantially reduced variability (1%–2.1%) compared to random exemplar selection. This approach enhances robustness of online models under memory limits and class imbalance, with potential applicability to broader online learning tasks and future work exploring additional metrics and models.
Abstract
Analytical models developed in offline settings with pre-prepared data are typically used to predict students' performance. However, when data are available over time, this learning method is not suitable anymore. Online learning is increasingly used to update the online models from stream data. A rehearsal technique is typically used, which entails re-training the model on a small training set that is updated each time new data is received. The main challenge in this regard is the construction of the training set with appropriate data samples to maintain good model performance. Typically, a random selection of samples is made, which can deteriorate the model's performance. In this paper, we propose a memory-based online incremental learning approach for updating an online classifier that predicts student performance using stream data. The approach is based on the use of the genetic algorithm heuristic while respecting the memory space constraints as well as the balance of class labels. In contrast to random selection, our approach improves the stability of the analytical model by promoting diversity when creating the training set. As a proof of concept, we applied it to the open dataset OULAD. Our approach achieves a notable improvement in model accuracy, with an enhancement of nearly 10% compared to the current state-of-the-art, while maintaining a relatively low standard deviation in accuracy, ranging from 1% to 2.1%.
