Mining patterns in syntax trees to automate code reviews of student solutions for programming exercises

Charlotte Van Petegem; Kasper Demeyere; Rien Maertens; Niko Strijbol; Bram De Wever; Bart Mesuere; Peter Dawyndt

Mining patterns in syntax trees to automate code reviews of student solutions for programming exercises

Charlotte Van Petegem, Kasper Demeyere, Rien Maertens, Niko Strijbol, Bram De Wever, Bart Mesuere, Peter Dawyndt

TL;DR

This work tackles the labor-intensity and inconsistency of manual feedback in programming education by introducing ECHO, an AST-based pattern mining approach to reuse feedback annotations from prior code reviews. ECHO builds a language-agnostic pipeline using tree-sitter to create AST contexts, TreeminerD to discover frequent subtree patterns, and a weighted scoring mechanism with $weight(pattern) = \frac{len(pattern)}{#occurrences(pattern)}$ and $score(annotation) = \frac{\sum_{pattern \in patterns} weight(pattern) [pattern\ matches]}{\lvert patterns\rvert}$ to rank potential feedback for new submissions. Empirical results show strong top-5 accuracy for both automated Pylint annotations and human reviewer annotations, with training and per-annotation prediction times generally in the sub-second to sub-millisecond range, enabling real-time assistance during live reviews. The findings suggest that integrating ECHO into learning platforms can reduce manual feedback effort while improving consistency, and point to future directions including confidence-scored suggestions, broader context, speed optimizations, and potential hybrid feedback with LLM-based systems.

Abstract

In programming education, providing manual feedback is essential but labour-intensive, posing challenges in consistency and timeliness. We introduce ECHO, a machine learning method to automate the reuse of feedback in educational code reviews by analysing patterns in abstract syntax trees. This study investigates two primary questions: whether ECHO can predict feedback annotations to specific lines of student code based on previously added annotations by human reviewers (RQ1), and whether its training and prediction speeds are suitable for using ECHO for real-time feedback during live code reviews by human reviewers (RQ2). Our results, based on annotations from both automated linting tools and human reviewers, show that ECHO can accurately and quickly predict appropriate feedback annotations. Its efficiency in processing and its flexibility in adapting to feedback patterns can significantly reduce the time and effort required for manual feedback provisioning in educational settings.

Mining patterns in syntax trees to automate code reviews of student solutions for programming exercises

TL;DR

and

to rank potential feedback for new submissions. Empirical results show strong top-5 accuracy for both automated Pylint annotations and human reviewer annotations, with training and per-annotation prediction times generally in the sub-second to sub-millisecond range, enabling real-time assistance during live reviews. The findings suggest that integrating ECHO into learning platforms can reduce manual feedback effort while improving consistency, and point to future directions including confidence-scored suggestions, broader context, speed optimizations, and potential hybrid feedback with LLM-based systems.

Abstract

Paper Structure (8 sections, 2 equations, 16 figures, 2 tables)

This paper contains 8 sections, 2 equations, 16 figures, 2 tables.

Introduction
Methodology
Training
Ranking
Results and discussion
Machine annotations (Pylint)
Human annotations
Conclusions and future work

Figures (16)

Figure 1: Assessment of a submitted solution in Dodona. An automated assessment has already been performed, with 22 failed test cases, as can be seen from the badge on the "Correctness" tab. An automated annotation left by Pylint can be seen on line 22. A teacher gives feedback on the code by adding inline annotations and scores the submission by filling out the exercise-specific scoring rubric. The teacher has just searched for a previously saved annotation so that they could reuse it. After manually assessing this submission, the teacher still has another 23 submissions to assess, as shown in the progress bar on the right.
Figure 2: Overview of ECHO. Code of previously reviewed submissions is converted to its abstract syntax tree (AST) form. Instances of the same annotation have the same colour. For each annotation, the context of each instance is extracted and mined for patterns using the TreeminerD algorithm. These patterns are then weighted to form our model. When a reviewer wants to place an annotation on a line of the submissions they are currently reviewing, all previously given annotations are ranked based on the similarity determined for that line. The reviewer can then choose which annotation they want to place, with the aim of having the selected annotation at the top of in the ranking.
Figure 3: AST subtree corresponding to line 3 in Listing \ref{['lst:feedbacksubtreesample']} as generated by tree-sitter.
Figure 4: Valid pattern for the tree in Figure \ref{['fig:feedbacksubtree']}. Indirect ancestor-descendant relationships are marked with dashed lines.
Figure 5: Prediction accuracy for suggesting instances of Pylint annotations where training and test data are equally split. The numbers on the right are the total number of annotations and instances respectively. The "Combined" test evaluated ECHO on the entire set of submissions for all exercises.
...and 11 more figures

Mining patterns in syntax trees to automate code reviews of student solutions for programming exercises

TL;DR

Abstract

Mining patterns in syntax trees to automate code reviews of student solutions for programming exercises

Authors

TL;DR

Abstract

Table of Contents

Figures (16)