Table of Contents
Fetching ...

Uncovering Student Engagement Patterns in Moodle with Interpretable Machine Learning

Laura J. Johnston, Jim E. Griffin, Ioanna Manolopoulou, Takoua Jendoubi

TL;DR

The paper addresses the challenge of quantifying student engagement using VLE log data by proposing a weekly, chapter-structured engagement metric that combines immediacy, frequency, and diversity. It evaluates nine regression models, with a strong emphasis on interpretability through Generalised Additive Models (GAM) and predictive strength via Random Forests, using nested cross-validation. In a case study of a UCL computing module, the authors identify early weeks and pre-assessment periods as critical for engagement, while the impact of delivery method remains inconclusive due to confounding factors. The work contributes to learning analytics by refining engagement measurement, demonstrating actionable weekly predictors, and enabling data-driven teaching strategies for proactive student support.

Abstract

Understanding and enhancing student engagement through digital platforms is critical in higher education. This study introduces a methodology for quantifying engagement across an entire module using virtual learning environment (VLE) activity log data. Using study session frequency, immediacy, and diversity, we create a cumulative engagement metric and model it against weekly VLE interactions with resources to identify critical periods and resources predictive of student engagement. In a case study of a computing module at University College London's Department of Statistical Science, we further examine how delivery methods (online, hybrid, in-person) impact student behaviour. Across nine regression models, we validate the consistency of the random forest model and highlight the interpretive strengths of generalised additive models for analysing engagement patterns. Results show weekly VLE clicks as reliable engagement predictors, with early weeks and the first assessment period being key. However, the impact of delivery methods on engagement is inconclusive due to inconsistencies across models. These findings support early intervention strategies to assist students at risk of disengagement. This work contributes to learning analytics research by proposing a refined VLE-based engagement metric and advancing data-driven teaching strategies in higher education.

Uncovering Student Engagement Patterns in Moodle with Interpretable Machine Learning

TL;DR

The paper addresses the challenge of quantifying student engagement using VLE log data by proposing a weekly, chapter-structured engagement metric that combines immediacy, frequency, and diversity. It evaluates nine regression models, with a strong emphasis on interpretability through Generalised Additive Models (GAM) and predictive strength via Random Forests, using nested cross-validation. In a case study of a UCL computing module, the authors identify early weeks and pre-assessment periods as critical for engagement, while the impact of delivery method remains inconclusive due to confounding factors. The work contributes to learning analytics by refining engagement measurement, demonstrating actionable weekly predictors, and enabling data-driven teaching strategies for proactive student support.

Abstract

Understanding and enhancing student engagement through digital platforms is critical in higher education. This study introduces a methodology for quantifying engagement across an entire module using virtual learning environment (VLE) activity log data. Using study session frequency, immediacy, and diversity, we create a cumulative engagement metric and model it against weekly VLE interactions with resources to identify critical periods and resources predictive of student engagement. In a case study of a computing module at University College London's Department of Statistical Science, we further examine how delivery methods (online, hybrid, in-person) impact student behaviour. Across nine regression models, we validate the consistency of the random forest model and highlight the interpretive strengths of generalised additive models for analysing engagement patterns. Results show weekly VLE clicks as reliable engagement predictors, with early weeks and the first assessment period being key. However, the impact of delivery methods on engagement is inconclusive due to inconsistencies across models. These findings support early intervention strategies to assist students at risk of disengagement. This work contributes to learning analytics research by proposing a refined VLE-based engagement metric and advancing data-driven teaching strategies in higher education.

Paper Structure

This paper contains 32 sections, 13 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Scatter plots illustrating the relationship between the student engagement metric and the four predictors — clicks, lecture note accesses, video views, and quiz submissions — for weeks 7 (term 1) and 21 (term 2). Each point represents a sample colour-coded by the delivery method (online, hybrid, in-person).
  • Figure 2: Boxplots comparing the out-of-sample RMSE and R-squared values across ten folds for the nine regression models.
  • Figure 3: Bar plots showing each final model's mean residual and RMSE, segmented by student engagement levels. We assign student engagement levels according to the quintiles of the student engagement metric (very low - very high).
  • Figure 4: Heatmaps illustrate the significance of predictors in the rGAM and variable importance in the Random Forest model. In the rGAM heatmap, colours represent the significance level of each predictor according to its p-value; darker shades indicate higher significance. White indicates predictors that were either insignificant or not included in the model. For the Random Forest heatmap, the right-side legend denotes the importance of each predictor, with comparative magnitudes offering more insight into their relative importance than absolute values. This visualisation aids in understanding which predictors are most influential across both modelling approaches.
  • Figure 5: Smooth functions from the rGAM model for click predictors across weeks 9, 13, and 14 in term one, and 20, 21, and 22 in term two. Each curve represents the modelled relationship between the number of clicks and student engagement for the respective week, with confidence intervals shaded in light blue. The mean value before standardisation is labelled in red, while the value at two standard deviations above the mean is labelled in blue. A rug plot on the x-axis shows the distribution of data points.
  • ...and 1 more figures