An effect analysis of the balancing techniques on the counterfactual explanations of student success prediction models
Mustafa Cavus, Jakub Kuzilek
TL;DR
The paper investigates how data balancing affects counterfactual explanations for student success prediction models in higher education. It compares WhatIf, Multi-Objective Counterfactual Explanations (MOC), and Nearest Instance Counterfactual Explanations (NICE) across balancing strategies such as oversampling, undersampling, SMOTE, and cost-sensitive learning on the Open University Learning Analytics Dataset (OULAD) using Random Forest models. Counterfactual quality and model performance are assessed with metrics including accuracy, F1, AUC, and properties like validity, proximity, sparsity, and plausibility, revealing that NICE_sp and NICE_pr are the most robust on original data while balancing generally improves explanation realism. The findings inform trustworthy, actionable educational interventions and highlight balancing as a key factor in the deployment of explainable AI in education.
Abstract
In the past decade, we have experienced a massive boom in the usage of digital solutions in higher education. Due to this boom, large amounts of data have enabled advanced data analysis methods to support learners and examine learning processes. One of the dominant research directions in learning analytics is predictive modeling of learners' success using various machine learning methods. To build learners' and teachers' trust in such methods and systems, exploring the methods and methodologies that enable relevant stakeholders to deeply understand the underlying machine-learning models is necessary. In this context, counterfactual explanations from explainable machine learning tools are promising. Several counterfactual generation methods hold much promise, but the features must be actionable and causal to be effective. Thus, obtaining which counterfactual generation method suits the student success prediction models in terms of desiderata, stability, and robustness is essential. Although a few studies have been published in recent years on the use of counterfactual explanations in educational sciences, they have yet to discuss which counterfactual generation method is more suitable for this problem. This paper analyzed the effectiveness of commonly used counterfactual generation methods, such as WhatIf Counterfactual Explanations, Multi-Objective Counterfactual Explanations, and Nearest Instance Counterfactual Explanations after balancing. This contribution presents a case study using the Open University Learning Analytics dataset to demonstrate the practical usefulness of counterfactual explanations. The results illustrate the method's effectiveness and describe concrete steps that could be taken to alter the model's prediction.
