Mining Hierarchies with Conviction: Constructing the CS1 Skill Hierarchy with Pairwise Comparisons over Skill Distributions
Dip Kiran Pradhan Newar, Max Fowler, David H. Smith, Seth Poulsen
TL;DR
This work addresses the problem of establishing prerequisite relationships among five CS1 programming skills by applying a directional Conviction measure from association rule mining to binarized exam data collected from four CP1 Python exams (n > 600). The authors binarize scores using multiple thresholds and perform pairwise analyses across ten skill pairs with Wilcoxon tests and Bonferroni correction, finding that Tracing reliably precedes Write, Explain, and Sequence, while Write more often precedes Explain under a mean-threshold but not consistently under a median-threshold. They also observe co-requisite patterns such as Seq↔Explain and Write↔Seq, suggesting tighter coupling among some skills. The study contributes a data-driven, direction-aware skill hierarchy that can guide CS1 teaching sequences and assessments, while noting limitations related to scoring methods and calling for further validation across contexts.
Abstract
Background and Context: Some skills taught in introductory programming courses are categorized into 1) explaining code, 2) arranging lines of code in correct sequence, 3) tracing through the execution of a program, and 4) writing code from scratch. Objective: Knowing if a programming skill is a prerequisite to another would benefit teachers in properly planning the course and structuring the order in which they present activities relating to new content. Prior attempts to establish a skill hierarchy have suffered from methodological issues. Method: In this study, we used the conviction measure from association rule mining to perform pair-wise comparisons of five skills: Write, Trace, Reverse trace, Sequence, and Explain code. We used the data from four exams with more than 600 participants where students solved programming assignments of different skills for several programming topics. Findings: Our findings matched the previous finding that tracing is a prerequisite for students to learn to write code. Contradicting the previous claims, our analysis showed that using the mean threshold writing code is a prerequisite to explaining code. However, there is no clear relationship when we change the threshold to the median. Unlike prior work, we did not find a clear prerequisite relationship between sequencing code and writing or explaining code. Implications: Our research can help instructors by systematically arranging the skills students exercise when encountering a new topic. The goal is to help instructors properly teach and assess programming in a fashion most effective for learning by leveraging the relationship between skills.
