Exploring the Relationship Between Feature Attribution Methods and Model Performance
Priscylla Silva, Claudio T. Silva, Luis Gustavo Nonato
TL;DR
The paper tackles the problem of explainability in educational predictions by analyzing whether higher predictive performance correlates with stronger consensus among nine feature attribution methods. It frames student-success prediction as a binary task and uses nine attribution techniques, four (dis)agreement metrics, and two real-world datasets with intermediate-epoch model snapshots to quantify how explanation agreement evolves with model quality. The study finds a very strong Spearman correlation between AUC and agreement across methods, indicating that better-performing models yield more consistent explanations, with practical implications for selecting models and interpreting predictions in education. These results highlight an intrinsic link between model performance and interpretability, supporting the use of high-AUC models to improve the reliability of explanation-driven decisions in educational settings.
Abstract
Machine learning and deep learning models are pivotal in educational contexts, particularly in predicting student success. Despite their widespread application, a significant gap persists in comprehending the factors influencing these models' predictions, especially in explainability within education. This work addresses this gap by employing nine distinct explanation methods and conducting a comprehensive analysis to explore the correlation between the agreement among these methods in generating explanations and the predictive model's performance. Applying Spearman's correlation, our findings reveal a very strong correlation between the model's performance and the agreement level observed among the explanation methods.
