Table of Contents
Fetching ...

Towards Human-Centered Early Prediction Models for Academic Performance in Real-World Contexts

Han Zhang, Yiyi Ren, Paula S. Nurius, Jennifer Mankoff, Anind K. Dey

TL;DR

This paper tackles the problem of predicting academic performance early in real-world settings by integrating three approaches—Logistic Regression (LR), 1D-CNN, and MTL-1D-CNN—trained on passive behavioral data and self-reports collected within the first week of Spring terms. It emphasizes HCML principles (explainability, fairness, generalizability) and demonstrates that predictions can be made by Week 1 with a GPA threshold of $3.2$, while highlighting trade-offs among the principles. LR and 1D-CNN achieve high early accuracy and reveal interpretable predictors, though deep learning approaches struggle with fairness and generalizability; MTL-1D-CNN offers stronger cross-year robustness but shows fairness and explainability challenges. The findings underscore socio-technical challenges in deploying such systems and advocate for human-centered, privacy-preserving, and governance-aware integration into multi-stakeholder student support workflows to enable timely, equitable interventions.

Abstract

Supporting student success requires collaboration among multiple stakeholders. Researchers have explored machine learning models for academic performance prediction; yet key challenges remain in ensuring these models are interpretable, equitable, and actionable within real-world educational support systems. First, many models prioritize predictive accuracy but overlook human-centered machine learning principles, limiting trust among students and reducing their usefulness for educators and institutional decision-makers. Second, most models require at least a month of data before making reliable predictions, delaying opportunities for early intervention. Third, current models primarily rely on sporadically collected, classroom-derived data, missing broader behavioral patterns that could provide more continuous and actionable insights. To address these gaps, we present three modeling approaches-LR, 1D-CNN, and MTL-1D-CNN-to classify students as low or high academic performers. We evaluate them based on explainability, fairness, and generalizability to assess their alignment with key social values. Using behavioral and self-reported data collected within the first week of two Spring terms, we demonstrate that these models can identify at-risk students as early as week one. However, trade-offs across human-centered machine learning principles highlight the complexity of designing predictive models that effectively support multi-stakeholder decision-making and intervention strategies. We discuss these trade-offs and their implications for different stakeholders, outlining how predictive models can be integrated into student support systems. Finally, we examine broader socio-technical challenges in deploying these models and propose future directions for advancing human-centered, collaborative academic prediction systems.

Towards Human-Centered Early Prediction Models for Academic Performance in Real-World Contexts

TL;DR

This paper tackles the problem of predicting academic performance early in real-world settings by integrating three approaches—Logistic Regression (LR), 1D-CNN, and MTL-1D-CNN—trained on passive behavioral data and self-reports collected within the first week of Spring terms. It emphasizes HCML principles (explainability, fairness, generalizability) and demonstrates that predictions can be made by Week 1 with a GPA threshold of , while highlighting trade-offs among the principles. LR and 1D-CNN achieve high early accuracy and reveal interpretable predictors, though deep learning approaches struggle with fairness and generalizability; MTL-1D-CNN offers stronger cross-year robustness but shows fairness and explainability challenges. The findings underscore socio-technical challenges in deploying such systems and advocate for human-centered, privacy-preserving, and governance-aware integration into multi-stakeholder student support workflows to enable timely, equitable interventions.

Abstract

Supporting student success requires collaboration among multiple stakeholders. Researchers have explored machine learning models for academic performance prediction; yet key challenges remain in ensuring these models are interpretable, equitable, and actionable within real-world educational support systems. First, many models prioritize predictive accuracy but overlook human-centered machine learning principles, limiting trust among students and reducing their usefulness for educators and institutional decision-makers. Second, most models require at least a month of data before making reliable predictions, delaying opportunities for early intervention. Third, current models primarily rely on sporadically collected, classroom-derived data, missing broader behavioral patterns that could provide more continuous and actionable insights. To address these gaps, we present three modeling approaches-LR, 1D-CNN, and MTL-1D-CNN-to classify students as low or high academic performers. We evaluate them based on explainability, fairness, and generalizability to assess their alignment with key social values. Using behavioral and self-reported data collected within the first week of two Spring terms, we demonstrate that these models can identify at-risk students as early as week one. However, trade-offs across human-centered machine learning principles highlight the complexity of designing predictive models that effectively support multi-stakeholder decision-making and intervention strategies. We discuss these trade-offs and their implications for different stakeholders, outlining how predictive models can be integrated into student support systems. Finally, we examine broader socio-technical challenges in deploying these models and propose future directions for advancing human-centered, collaborative academic prediction systems.

Paper Structure

This paper contains 70 sections, 3 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Overview of the whole modeling pipeline for the three approaches. All three approaches utilize the same data sources and extracted features. However, distinct data pre-processing and modeling techniques were applied to the LR approach compared to the 1D-CNN and MTL-1D-CNN approaches.
  • Figure 2: Overview of the training (highlighted in light gray) and testing process for the three approaches. (a) shows the training and testing process for the LR and 1D-CNN approaches (using 2018 Spring term data as an example), where data collected by the first week is used for training and testing to predict end-of-term GPA for both 2018 and 2019. (b) shows the training and testing process for the MTL-1D-CNN approach, where training includes two tasks: Task 1 uses the first week of data from 2018 to predict end-of-term GPA, while Task 2 combines first-week data from both 2018 and 2019 to predict prior-term Winter GPA. Testing uses data from 2019 to predict end-of-term GPA.
  • Figure 3: Radar charts comparing the fairness performance of three approaches (LR, 1D-CNN, MTL-1D-CNN) across four protected traits (race, first-generation, gender, sexual orientation) using three fairness metrics: demographic parity, equalized odds, and equal opportunity. The first row shows the difference between the protected traits, where the light yellow shaded regions indicate values between -0.1 and 0.1, representing a reasonable fair difference. The second row shows the ratio, where the light yellow shaded regions highlight ratio values between 0.8 and 1.2, indicating a reasonable fair performance.
  • Figure 4: Accuracy of three approaches as well as the baselines in predicting academic performance consistency and transitions. It presents the percentage of each approach accurately predicting four categories: remained a high performer, remained a low performer, improved from low to high performer, and declined from high to low performer.
  • Figure 5: Distributions of spring term GPA among all students and distribution of spring term GPA of high and low performers in 2018 and 2019. In 2018, 145 (77%) out of 188 students are labeled as high performers ($M$ = 3.69, $std$ = 0.21), 43 (23%) students are labeled as low performers ($M$ = 2.76, $std$ = 0.39). In 2019, 133 (68%) out of 196 students are labeled as high performers ($M$ = 3.63, $std$ = 0.24), 66 (32%) students are labeled as low performers ($M$ = 2.69, $std$ = 0.58).