Rashomon effect in Educational Research: Why More is Better Than One for Measuring the Importance of the Variables?
Jakub Kuzilek, Mustafa Çavuş
TL;DR
The paper tackles how the Rashomon effect influences variable importance in educational data mining by constructing a Rashomon set of $424$ tree-based models (DT, RF, LightGBM, XGBoost) tuned with $\epsilon=0.05$ on the Open University Learning Analytics Dataset (OULAD). It evaluates predictive performance and variable importance using Permutational Variable Importance (PVI) and the Variable Importance Order Discrepancy (VIOD) measured via Kendall's $\tau$, revealing a $2$-$6\%$ accuracy gain within the Rashomon set and greater stability of variable importance in binary vs multiclass settings. The results show that $imd\_band$ and $highest\_education$ consistently matter across courses, though rankings vary by course, with Course D exhibiting notable instability, indicating context-specific dynamics. The study underscores the importance of considering multiple well-performing models when interpreting demographic effects and provides reproducible code, advocating cautious generalization while enhancing reliability of interpretation in educational data mining.
Abstract
This study explores how the Rashomon effect influences variable importance in the context of student demographics used for academic outcomes prediction. Our research follows the way machine learning algorithms are employed in Educational Data Mining, focusing on highlighting the so-called Rashomon effect. The study uses the Rashomon set of simple-yet-accurate models trained using decision trees, random forests, light GBM, and XGBoost algorithms with the Open University Learning Analytics Dataset. We found that the Rashomon set improves the predictive accuracy by 2-6%. Variable importance analysis revealed more consistent and reliable results for binary classification than multiclass classification, highlighting the complexity of predicting multiple outcomes. Key demographic variables imd_band and highest_education were identified as vital, but their importance varied across courses, especially in course DDD. These findings underscore the importance of model choice and the need for caution in generalizing results, as different models can lead to different variable importance rankings. The codes for reproducing the experiments are available in the repository: https://anonymous.4open.science/r/JEDM_paper-DE9D.
