Table of Contents
Fetching ...

ClickTree: A Tree-based Method for Predicting Math Students' Performance Based on Clickstream Data

Narjes Rohani, Behnam Rohani, Areti Manataki

Abstract

The prediction of student performance and the analysis of students' learning behavior play an important role in enhancing online courses. By analysing a massive amount of clickstream data that captures student behavior, educators can gain valuable insights into the factors that influence academic outcomes and identify areas of improvement in courses. In this study, we developed ClickTree, a tree-based methodology, to predict student performance in mathematical assignments based on students' clickstream data. We extracted a set of features, including problem-level, assignment-level and student-level features, from the extensive clickstream data and trained a CatBoost tree to predict whether a student successfully answers a problem in an assignment. The developed method achieved an AUC of 0.78844 in the Educational Data Mining Cup 2023 and ranked second in the competition. Furthermore, our results indicate that students encounter more difficulties in the problem types that they must select a subset of answers from a given set as well as problem subjects of Algebra II. Additionally, students who performed well in answering end-unit assignment problems engaged more with in-unit assignments and answered more problems correctly, while those who struggled had higher tutoring request rate. The proposed method can be utilized to improve students' learning experiences, and the above insights can be integrated into mathematical courses to enhance students' learning outcomes.

ClickTree: A Tree-based Method for Predicting Math Students' Performance Based on Clickstream Data

Abstract

The prediction of student performance and the analysis of students' learning behavior play an important role in enhancing online courses. By analysing a massive amount of clickstream data that captures student behavior, educators can gain valuable insights into the factors that influence academic outcomes and identify areas of improvement in courses. In this study, we developed ClickTree, a tree-based methodology, to predict student performance in mathematical assignments based on students' clickstream data. We extracted a set of features, including problem-level, assignment-level and student-level features, from the extensive clickstream data and trained a CatBoost tree to predict whether a student successfully answers a problem in an assignment. The developed method achieved an AUC of 0.78844 in the Educational Data Mining Cup 2023 and ranked second in the competition. Furthermore, our results indicate that students encounter more difficulties in the problem types that they must select a subset of answers from a given set as well as problem subjects of Algebra II. Additionally, students who performed well in answering end-unit assignment problems engaged more with in-unit assignments and answered more problems correctly, while those who struggled had higher tutoring request rate. The proposed method can be utilized to improve students' learning experiences, and the above insights can be integrated into mathematical courses to enhance students' learning outcomes.
Paper Structure (17 sections, 4 equations, 3 figures, 1 table)

This paper contains 17 sections, 4 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: a: Average scores in different types of problems. b: 15 most difficult problem skills with more than 100 occurrences in the whole dataset (problem skill indicates the skill required for solving a problem ). c: Average scores among different topic/grade of assignments. d: 15 most difficult problem subjects with more than 100 occurrences in the whole dataset
  • Figure 2: The AUC values of the different methods on the validation data
  • Figure 3: Top 10 most important features for the CatBoost classifier based on the information gain.