Table of Contents
Fetching ...

Integrating behavior analysis with machine learning to predict online learning performance: A scientometric review and empirical study

Jin Yuan, Xuelan Qiu, Jinran Wu, Jiesi Guo, Weide Li, You-Gan Wang

TL;DR

The paper tackles the gap in online learning performance prediction by showing that learning behavior patterns materially affect ML accuracy. It introduces a two-stage integration framework that clusters students by learning behaviors and then applies pattern-specific ML models (notably XGBoost) within each cluster, contrasted with a direct approach. Empirical results on the edX HarvardX2014 dataset identify two patterns—low autonomy and motivated—demonstrating near-perfect prediction for the former and solid performance for the latter, with behavior-aware modeling yielding clear gains and improved interpretability via SHAP analyses. These findings highlight the practical value of incorporating behavior analysis into predictive pipelines to enhance accuracy and provide actionable insights for tailoring online instruction.

Abstract

The interest in predicting online learning performance using ML algorithms has been steadily increasing. We first conducted a scientometric analysis to provide a systematic review of research in this area. The findings show that most existing studies apply the ML methods without considering learning behavior patterns, which may compromise the prediction accuracy and precision of the ML methods. This study proposes an integration framework that blends learning behavior analysis with ML algorithms to enhance the prediction accuracy of students' online learning performance. Specifically, the framework identifies distinct learning patterns among students by employing clustering analysis and implements various ML algorithms to predict performance within each pattern. For demonstration, the integration framework is applied to a real dataset from edX and distinguishes two learning patterns, as in, low autonomy students and motivated students. The results show that the framework yields nearly perfect prediction performance for autonomous students and satisfactory performance for motivated students. Additionally, this study compares the prediction performance of the integration framework to that of directly applying ML methods without learning behavior analysis using comprehensive evaluation metrics. The results consistently demonstrate the superiority of the integration framework over the direct approach, particularly when integrated with the best-performing XGBoosting method. Moreover, the framework significantly improves prediction accuracy for the motivated students and for the worst-performing random forest method. This study also evaluates the importance of various learning behaviors within each pattern using LightGBM with SHAP values. The implications of the integration framework and the results for online education practice and future research are discussed.

Integrating behavior analysis with machine learning to predict online learning performance: A scientometric review and empirical study

TL;DR

The paper tackles the gap in online learning performance prediction by showing that learning behavior patterns materially affect ML accuracy. It introduces a two-stage integration framework that clusters students by learning behaviors and then applies pattern-specific ML models (notably XGBoost) within each cluster, contrasted with a direct approach. Empirical results on the edX HarvardX2014 dataset identify two patterns—low autonomy and motivated—demonstrating near-perfect prediction for the former and solid performance for the latter, with behavior-aware modeling yielding clear gains and improved interpretability via SHAP analyses. These findings highlight the practical value of incorporating behavior analysis into predictive pipelines to enhance accuracy and provide actionable insights for tailoring online instruction.

Abstract

The interest in predicting online learning performance using ML algorithms has been steadily increasing. We first conducted a scientometric analysis to provide a systematic review of research in this area. The findings show that most existing studies apply the ML methods without considering learning behavior patterns, which may compromise the prediction accuracy and precision of the ML methods. This study proposes an integration framework that blends learning behavior analysis with ML algorithms to enhance the prediction accuracy of students' online learning performance. Specifically, the framework identifies distinct learning patterns among students by employing clustering analysis and implements various ML algorithms to predict performance within each pattern. For demonstration, the integration framework is applied to a real dataset from edX and distinguishes two learning patterns, as in, low autonomy students and motivated students. The results show that the framework yields nearly perfect prediction performance for autonomous students and satisfactory performance for motivated students. Additionally, this study compares the prediction performance of the integration framework to that of directly applying ML methods without learning behavior analysis using comprehensive evaluation metrics. The results consistently demonstrate the superiority of the integration framework over the direct approach, particularly when integrated with the best-performing XGBoosting method. Moreover, the framework significantly improves prediction accuracy for the motivated students and for the worst-performing random forest method. This study also evaluates the importance of various learning behaviors within each pattern using LightGBM with SHAP values. The implications of the integration framework and the results for online education practice and future research are discussed.
Paper Structure (24 sections, 1 equation, 12 figures, 9 tables)

This paper contains 24 sections, 1 equation, 12 figures, 9 tables.

Figures (12)

  • Figure 1: A visualization of keyword co-occurrence network using CiteSpace
  • Figure 2: A timeline visualization of research focus (2014--2023)
  • Figure 3: The integration framework for predicting learning performance (right side) against the direct approach without behavior analysis (left side)
  • Figure 4: Heat map for correlations between features
  • Figure 5: Comparison of learning patterns across age (left) and gender (right) groups
  • ...and 7 more figures