The Crowd in MOOCs: A Study of Learning Patterns at Scale
Xin Zhou, Aixin Sun, Jie Zhang, Donghui Lin
TL;DR
This paper analyzes 351 million MOOC learning activities across 1,629 courses to characterize temporal and enrollment patterns at scale. It combines temporal analysis with mutual information and sequential pattern mining to reveal structured daily/weekly activity, power-law inter-event distributions with a $T=24$ hour cosine modulation, and co-enrollment primarily within the same category or university, plus intra-category transition dominance. It then introduces FrePaPop, a fast, transition-based recommender that leverages frequent course transitions and achieves competitive accuracy while reducing training time by more than 200× compared to baselines. The findings offer scalable, practical insights for course recommendation, prerequisite inference, and learning-path design, providing a meaningful baseline for future large-scale MOOC analytics, including cross-platform generalization and pandemic-era learning behavior analyses.
Abstract
The increasing availability of learning activity data in Massive Open Online Courses (MOOCs) enables us to conduct a large-scale analysis of learners' learning behavior. In this paper, we analyze a dataset of 351 million learning activities from 0.8 million unique learners enrolled in over 1.6 thousand courses within two years. Specifically, we mine and identify the learning patterns of the crowd from both temporal and course enrollment perspectives leveraging mutual information theory and sequential pattern mining methods. From the temporal perspective, we find that the time intervals between consecutive learning activities of learners exhibit a mix of power-law and periodic cosine function distribution. By qualifying the relationship between course pairs, we observe that the most frequently co-enrolled courses usually fall in the same category or the same university. We demonstrate these findings can facilitate manifold applications including recommendation tasks on courses. A simple recommendation model utilizing the course enrollment patterns is competitive to the baselines with 200$\times$ faster training time.
