Rule-Based Error Classification for Analyzing Differences in Frequent Errors
Atsushi Shirafuji, Taku Matsumoto, Md Faizul Ibne Amin, Yutaka Watanobe
TL;DR
The paper addresses analyzing differences in frequent programming errors between novices and experts using a rule-based error classification tool applied to wrong-answer (WA) and accepted (AC) code pairs from AOJ. The approach combines data collection from AOJ ITP1, a code normalization pipeline that tokenizes and sanitizes code while preserving structure, changes-extraction with line/token-level labeling, and error classification via 55 predefined rules implemented with regex, followed by a difference-analysis stage comparing novices and experts. The tool supports both syntax and logic errors, and its performance is evaluated via manual validation on 1,000 pairs, achieving an accuracy of 91.71%. The analysis includes a chi-square test at $\alpha = 0.05$ with standardized Pearson residuals $r_{ij}$ to identify specific error types contributing to differences, revealing that novices' errors stem from fundamental knowledge gaps, while experts' errors arise from misreading problems or off-pattern solving approaches. The authors also present a dataset of 95,631 labeled code pairs from 44 problems that can be used for further educational research, such as error detection and fix suggestion.
Abstract
Finding and fixing errors is a time-consuming task not only for novice programmers but also for expert programmers. Prior work has identified frequent error patterns among various levels of programmers. However, the differences in the tendencies between novices and experts have yet to be revealed. From the knowledge of the frequent errors in each level of programmers, instructors will be able to provide helpful advice for each level of learners. In this paper, we propose a rule-based error classification tool to classify errors in code pairs consisting of wrong and correct programs. We classify errors for 95,631 code pairs and identify 3.47 errors on average, which are submitted by various levels of programmers on an online judge system. The classified errors are used to analyze the differences in frequent errors between novice and expert programmers. The analyzed results show that, as for the same introductory problems, errors made by novices are due to the lack of knowledge in programming, and the mistakes are considered an essential part of the learning process. On the other hand, errors made by experts are due to misunderstandings caused by the carelessness of reading problems or the challenges of solving problems differently than usual. The proposed tool can be used to create error-labeled datasets and for further code-related educational research.
