Table of Contents
Fetching ...

Understanding the Disparities in Mathematics Performance: An Interpretability-Based Examination

Ismael Gomez-Talal, Luis Bote-Curiel, Jose Luis Rojo-Alvarez

TL;DR

This study addresses international disparities in Mathematics performance using Spain-focused PISA data analyzed via interpretable ML. By preprocessing to three performance levels and applying SHAP-based explanations within stratified K-fold binary classifiers, it identifies reading access, critical thinking, gender, and geographic location as key determinants. The findings reveal strong regional disparities and socioeconomic influences, with reading materials and home resources linked to higher performance. The work demonstrates the value of interpretable models for informing targeted educational policies and future global interpretability explorations.

Abstract

Problem. Educational disparities in Mathematics performance are a persistent challenge. This study aims to unravel the complex factors contributing to these disparities among students internationally, with a focus on the interpretability of the contributing factors. Methodology. Utilizing data from the Programme for International Student Assessment (PISA), we conducted rigorous preprocessing and variable selection to prepare for applying binary classification interpretability models. These models were trained using the Stratified K-Fold technique to ensure balanced representation and assessed using six key metrics. Solution. By applying interpretability models such as Shapley Additive Explanations (SHAP) analysis, we identified critical factors impacting student performance, including reading accessibility, critical thinking skills, gender, and geographical location. Results. Our findings reveal significant disparities linked to resource availability, with students from lower socioeconomic backgrounds possessing fewer books and demonstrating lower performance in Mathematics. The geographical analysis highlighted regional educational disparities, with certain areas consistently underperforming in PISA assessments. Gender also emerged as a determinant, with females contributing differently to performance levels across the spectrum. Conclusion. The study provides insights into the multifaceted determinants of student Mathematics performance and suggests potential avenues for future research to explore global interpretability models and further investigate the socioeconomic, cultural, and educational factors at play.

Understanding the Disparities in Mathematics Performance: An Interpretability-Based Examination

TL;DR

This study addresses international disparities in Mathematics performance using Spain-focused PISA data analyzed via interpretable ML. By preprocessing to three performance levels and applying SHAP-based explanations within stratified K-fold binary classifiers, it identifies reading access, critical thinking, gender, and geographic location as key determinants. The findings reveal strong regional disparities and socioeconomic influences, with reading materials and home resources linked to higher performance. The work demonstrates the value of interpretable models for informing targeted educational policies and future global interpretability explorations.

Abstract

Problem. Educational disparities in Mathematics performance are a persistent challenge. This study aims to unravel the complex factors contributing to these disparities among students internationally, with a focus on the interpretability of the contributing factors. Methodology. Utilizing data from the Programme for International Student Assessment (PISA), we conducted rigorous preprocessing and variable selection to prepare for applying binary classification interpretability models. These models were trained using the Stratified K-Fold technique to ensure balanced representation and assessed using six key metrics. Solution. By applying interpretability models such as Shapley Additive Explanations (SHAP) analysis, we identified critical factors impacting student performance, including reading accessibility, critical thinking skills, gender, and geographical location. Results. Our findings reveal significant disparities linked to resource availability, with students from lower socioeconomic backgrounds possessing fewer books and demonstrating lower performance in Mathematics. The geographical analysis highlighted regional educational disparities, with certain areas consistently underperforming in PISA assessments. Gender also emerged as a determinant, with females contributing differently to performance levels across the spectrum. Conclusion. The study provides insights into the multifaceted determinants of student Mathematics performance and suggests potential avenues for future research to explore global interpretability models and further investigate the socioeconomic, cultural, and educational factors at play.

Paper Structure

This paper contains 19 sections, 9 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Distribution of rows by number of missing columns. The histogram presents the frequency of rows relative to the count of missing data points across all student records in the PISA dataset. A significant concentration of rows exhibits over 400 missing columns, which underscores the rationale for data pruning to enhance the dataset robustness for subsequent analysis.
  • Figure 2: Comprehensive Research Workflow utilizing the PISA dataset. Starting with (1) preprocessing, (2) data splitting into training and test sets, (3) addressing class imbalance with stratified 5-fold cross-validation and undersampling, (4) training various models with grid search optimization, (5) evaluating with metrics including ACC and AUC, and (6) applying SHAP for interpretability.
  • Figure 3: SHAP analysis comparing students with low and medium levels in Mathematics. The mean absolute SHAP value of student 3629 (medium level) is represented as summary plot (a) and decision plot (b) compared with student 2537 (low level), showing summary plot (c) and decision plot (d).
  • Figure 4: SHAP analysis comparing students with high and medium levels in Mathematics. The mean absolute SHAP value of student 626 (high level) is represented as summary plot (a) and decision plot (b) compared with student 656 (medium level), showing summary plot (c) and decision plot (d).
  • Figure 5: SHAP analysis comparing students with low and high levels in Mathematics. The mean absolute SHAP value of student 885 (low level) is represented as summary plot (a) and decision plot (b) compared with student 1382 (high level), showing summary plot (c) and decision plot (d).