Table of Contents
Fetching ...

The Crowd in MOOCs: A Study of Learning Patterns at Scale

Xin Zhou, Aixin Sun, Jie Zhang, Donghui Lin

TL;DR

This paper analyzes 351 million MOOC learning activities across 1,629 courses to characterize temporal and enrollment patterns at scale. It combines temporal analysis with mutual information and sequential pattern mining to reveal structured daily/weekly activity, power-law inter-event distributions with a $T=24$ hour cosine modulation, and co-enrollment primarily within the same category or university, plus intra-category transition dominance. It then introduces FrePaPop, a fast, transition-based recommender that leverages frequent course transitions and achieves competitive accuracy while reducing training time by more than 200× compared to baselines. The findings offer scalable, practical insights for course recommendation, prerequisite inference, and learning-path design, providing a meaningful baseline for future large-scale MOOC analytics, including cross-platform generalization and pandemic-era learning behavior analyses.

Abstract

The increasing availability of learning activity data in Massive Open Online Courses (MOOCs) enables us to conduct a large-scale analysis of learners' learning behavior. In this paper, we analyze a dataset of 351 million learning activities from 0.8 million unique learners enrolled in over 1.6 thousand courses within two years. Specifically, we mine and identify the learning patterns of the crowd from both temporal and course enrollment perspectives leveraging mutual information theory and sequential pattern mining methods. From the temporal perspective, we find that the time intervals between consecutive learning activities of learners exhibit a mix of power-law and periodic cosine function distribution. By qualifying the relationship between course pairs, we observe that the most frequently co-enrolled courses usually fall in the same category or the same university. We demonstrate these findings can facilitate manifold applications including recommendation tasks on courses. A simple recommendation model utilizing the course enrollment patterns is competitive to the baselines with 200$\times$ faster training time.

The Crowd in MOOCs: A Study of Learning Patterns at Scale

TL;DR

This paper analyzes 351 million MOOC learning activities across 1,629 courses to characterize temporal and enrollment patterns at scale. It combines temporal analysis with mutual information and sequential pattern mining to reveal structured daily/weekly activity, power-law inter-event distributions with a hour cosine modulation, and co-enrollment primarily within the same category or university, plus intra-category transition dominance. It then introduces FrePaPop, a fast, transition-based recommender that leverages frequent course transitions and achieves competitive accuracy while reducing training time by more than 200× compared to baselines. The findings offer scalable, practical insights for course recommendation, prerequisite inference, and learning-path design, providing a meaningful baseline for future large-scale MOOC analytics, including cross-platform generalization and pandemic-era learning behavior analyses.

Abstract

The increasing availability of learning activity data in Massive Open Online Courses (MOOCs) enables us to conduct a large-scale analysis of learners' learning behavior. In this paper, we analyze a dataset of 351 million learning activities from 0.8 million unique learners enrolled in over 1.6 thousand courses within two years. Specifically, we mine and identify the learning patterns of the crowd from both temporal and course enrollment perspectives leveraging mutual information theory and sequential pattern mining methods. From the temporal perspective, we find that the time intervals between consecutive learning activities of learners exhibit a mix of power-law and periodic cosine function distribution. By qualifying the relationship between course pairs, we observe that the most frequently co-enrolled courses usually fall in the same category or the same university. We demonstrate these findings can facilitate manifold applications including recommendation tasks on courses. A simple recommendation model utilizing the course enrollment patterns is competitive to the baselines with 200 faster training time.
Paper Structure (18 sections, 7 equations, 6 figures, 2 tables)

This paper contains 18 sections, 7 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The distribution of learners' daily amount of activities on the platform across a two-year period. The black curve represents the first year, while the cyan curve corresponds to the second year. The dotted line highlights Sundays, and the red dashed line indicates public holidays.
  • Figure 2: Distribution of learners' activities on a daily and weekly basis.
  • Figure 3: Distribution of the number of time intervals between consecutive activities. Gray dotted lines are daily indicators, and red dashed lines are weekly indicators. $y$-axis is in log scale.
  • Figure 4: Jaccard similarity index and the network topology of co-enrolled courses. In the network topology plot, each node denotes a course. Node size is proportional to the number of enrollment of the course, and nodes with the same color are within the same course category.
  • Figure 5: Heatmap of the category-level relationship of co-enrolled courses.
  • ...and 1 more figures