Exploring the Relationship Between Feature Attribution Methods and Model Performance

Priscylla Silva; Claudio T. Silva; Luis Gustavo Nonato

Exploring the Relationship Between Feature Attribution Methods and Model Performance

Priscylla Silva, Claudio T. Silva, Luis Gustavo Nonato

TL;DR

The paper tackles the problem of explainability in educational predictions by analyzing whether higher predictive performance correlates with stronger consensus among nine feature attribution methods. It frames student-success prediction as a binary task and uses nine attribution techniques, four (dis)agreement metrics, and two real-world datasets with intermediate-epoch model snapshots to quantify how explanation agreement evolves with model quality. The study finds a very strong Spearman correlation between AUC and agreement across methods, indicating that better-performing models yield more consistent explanations, with practical implications for selecting models and interpreting predictions in education. These results highlight an intrinsic link between model performance and interpretability, supporting the use of high-AUC models to improve the reliability of explanation-driven decisions in educational settings.

Abstract

Machine learning and deep learning models are pivotal in educational contexts, particularly in predicting student success. Despite their widespread application, a significant gap persists in comprehending the factors influencing these models' predictions, especially in explainability within education. This work addresses this gap by employing nine distinct explanation methods and conducting a comprehensive analysis to explore the correlation between the agreement among these methods in generating explanations and the predictive model's performance. Applying Spearman's correlation, our findings reveal a very strong correlation between the model's performance and the agreement level observed among the explanation methods.

Exploring the Relationship Between Feature Attribution Methods and Model Performance

TL;DR

Abstract

Paper Structure (15 sections, 4 equations, 6 figures, 1 table)

This paper contains 15 sections, 4 equations, 6 figures, 1 table.

Introduction
Methodology
Problem Formulation
Experimental Setup
Model Performance in Intermediate Epochs
(Dis)agreement measurement
Correlation Analysis
Results and Discussions
Discussion and Conclusion
Disagreement Metrics
Explanation Methods
Datasets
Data Distribution of the (dis)agreement score
Example of (dis)agreement between the pairs of methods
AUC vs Disagreement Level

Figures (6)

Figure 1: Correlation between Model Performance (AUC) and (Dis)agreement Metrics for Models Trained on the Introductory Programming Course Dataset.
Figure 2: Correlation between Model Performance (AUC) and (Dis)agreement Metrics for Models Trained on Amrieh_2015's Dataset.
Figure 3: Boxplots illustrating the distribution of the disagreement level score by FA metric.
Figure 4: Heatmap illustrating the (dis)agreement levels between explanation methods.
Figure 5: Correlation between Model Performance (AUC) and Disagreement Metrics for Models Trained on the Introductory Programming Course Dataset.
...and 1 more figures

Exploring the Relationship Between Feature Attribution Methods and Model Performance

TL;DR

Abstract

Exploring the Relationship Between Feature Attribution Methods and Model Performance

Authors

TL;DR

Abstract

Table of Contents

Figures (6)