Table of Contents
Fetching ...

Predicting Long-Term Student Outcomes from Short-Term EdTech Log Data

Ge Gao, Amelia Leon, Andrea Jetten, Jasmine Turner, Husni Almoubayyed, Stephen Fancsali, Emma Brunskill

TL;DR

The paper tackles predicting long-term, external student outcomes from short-horizon EdTech log data across three diverse contexts (CWTLReading, MATHia, iReady). It introduces a cross-domain pipeline that extracts a broad set of interpretable features, including aggregated, per-problem, time-based, and sequential-pattern metrics, and trains LR, SVR, and RF models under 5-fold cross-validation. The key finding is that 2–5 hours of initial usage data can yield predictive signals close to full historical data, with RF often delivering the best accuracy and performance varying by dataset. Pre-assessment scores provide strong predictive power and can further improve performance when available, though short-horizon logs alone offer meaningful insight for early intervention. The work demonstrates cross-context feature generalizability and offers practical guidance for deploying short-horizon predictors to support instructors and administrators while highlighting limitations in individual-level accuracy and the need for cautious deployment.

Abstract

Educational stakeholders are often particularly interested in sparse, delayed student outcomes, like end-of-year statewide exams. The rare occurrence of such assessments makes it harder to identify students likely to fail such assessments, as well as making it slow for researchers and educators to be able to assess the effectiveness of particular educational tools. Prior work has primarily focused on using logs from students full usage (e.g. year-long) of an educational product to predict outcomes, or considered predictive accuracy using a few minutes to predict outcomes after a short (e.g. 1 hour) session. In contrast, we investigate machine learning predictors using students' logs during their first few hours of usage can provide useful predictive insight into those students' end-of-school year external assessment. We do this on three diverse datasets: from students in Uganda using a literacy game product, and from students in the US using two mathematics intelligent tutoring systems. We consider various measures of the accuracy of the resulting predictors, including its ability to identify students at different parts along the assessment performance distribution. Our findings suggest that short-term log usage data, from 2-5 hours, can be used to provide valuable signal about students' long-term external performance.

Predicting Long-Term Student Outcomes from Short-Term EdTech Log Data

TL;DR

The paper tackles predicting long-term, external student outcomes from short-horizon EdTech log data across three diverse contexts (CWTLReading, MATHia, iReady). It introduces a cross-domain pipeline that extracts a broad set of interpretable features, including aggregated, per-problem, time-based, and sequential-pattern metrics, and trains LR, SVR, and RF models under 5-fold cross-validation. The key finding is that 2–5 hours of initial usage data can yield predictive signals close to full historical data, with RF often delivering the best accuracy and performance varying by dataset. Pre-assessment scores provide strong predictive power and can further improve performance when available, though short-horizon logs alone offer meaningful insight for early intervention. The work demonstrates cross-context feature generalizability and offers practical guidance for deploying short-horizon predictors to support instructors and administrators while highlighting limitations in individual-level accuracy and the need for cautious deployment.

Abstract

Educational stakeholders are often particularly interested in sparse, delayed student outcomes, like end-of-year statewide exams. The rare occurrence of such assessments makes it harder to identify students likely to fail such assessments, as well as making it slow for researchers and educators to be able to assess the effectiveness of particular educational tools. Prior work has primarily focused on using logs from students full usage (e.g. year-long) of an educational product to predict outcomes, or considered predictive accuracy using a few minutes to predict outcomes after a short (e.g. 1 hour) session. In contrast, we investigate machine learning predictors using students' logs during their first few hours of usage can provide useful predictive insight into those students' end-of-school year external assessment. We do this on three diverse datasets: from students in Uganda using a literacy game product, and from students in the US using two mathematics intelligent tutoring systems. We consider various measures of the accuracy of the resulting predictors, including its ability to identify students at different parts along the assessment performance distribution. Our findings suggest that short-term log usage data, from 2-5 hours, can be used to provide valuable signal about students' long-term external performance.

Paper Structure

This paper contains 16 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Prediction performance (mean$\pm$standard error from 5-fold cross-validation) with cumulative log data in varied hours across datasets ($H$ is the horizon of the dataset, indicating entire log data is used for prediction). LR: linear regression; SVR: support vector regression; RF: random forest; Baseline: using mean post-test scores from training set as predicted scores. Note all figures on a row share the same y-axis scale, the x-axis is not evenly spaced (see tick marks and labels), and these models only use log data, no additional student demographic or preassessment data.
  • Figure 2: Confusion matrices of the predicted quantile groups using the short horizon log data (Y-axis) and the actual five subgroups in training set (X-axis), the values are obtained from 5-fold cross validation.
  • Figure 3: Confusion matrices of the predicted quantile groups using the short horizon log data (Y-axis) and the actual fifth subgroups in training set (X-axis) on CWTLReading and iReady with (Short+Pre-Test) and without (Short) pre-test scores, the values are obtained from LR with 5-fold cross validation.