Table of Contents
Fetching ...

RePurr: Automated Repair of Block-Based Learners' Programs

Sebastian Schweikl, Gordon Fraser

TL;DR

This work tackles the problem of providing automated feedback for block-based Scratch programs, where semantic errors and incomplete tasks thwart novices. It introduces RePurr, the first APR system for Scratch, using evolutionary search guided by a refined fault-localization strategy and augmented by alternative fix sources to bypass the plastic surgery limitation. The approach defines a detailed fitness function that leverages assertion-level information, develops Scratch-specific crossover and mutation operators, and applies parallelized test evaluation to manage long-running Whisker tests. Empirical results on two Scratch datasets show that repair can improve learner programs and yield partial or full fixes, enabling automatic hint generation, though the plastic surgery hypothesis does not hold in general for student code unless diverse fix sources are used. The findings highlight the potential of search-based repair to support automated feedback in classrooms and MOOC-style settings, while outlining avenues for hybridization with data-driven hints and language-model-based strategies to reduce runtime and improve effectiveness.

Abstract

Programming is increasingly taught using block-based languages like Scratch. While the use of blocks prevents syntax errors, learners can still make semantic mistakes, requiring feedback and help. As teachers may be overwhelmed by help requests in a classroom, may lack programming expertise themselves, or may be unavailable in independent learning scenarios, automated hint generation is desirable. Automated program repair (APR) can provide the foundation for this, but relies on multiple assumptions: (1) APR usually targets isolated bugs, but learners may fundamentally misunderstand tasks or request help for substantially incomplete code. (2) Software tests are required to guide the search and localize broken blocks, but tests for block-based programs are different to those in past APR research: They consist of system tests, and very few of them already fully cover the code. At the same time, they have vastly longer runtimes due to animations and interactions on Scratch programs, which inhibits the applicability of search. (3) The plastic surgery hypothesis assumes the code necessary for repairs already exists in the codebase. Block-based programs tend to be small and may lack this redundancy. To study if APR of such programs is still feasible, we introduce, to the best of our knowledge, the first APR approach for Scratch based on evolutionary search. Our RePurr prototype includes novel refinements of fault localization to improve the guidance of test suites, recovers the plastic surgery hypothesis by exploiting that learning scenarios provide model and student solutions, and reduces the costs of fitness evaluations via test parallelization and acceleration. Empirical evaluation on a set of real learners' programs confirms the anticipated challenges, but also demonstrates APR can still effectively improve and fix learners' programs, enabling automated generation of hints and feedback.

RePurr: Automated Repair of Block-Based Learners' Programs

TL;DR

This work tackles the problem of providing automated feedback for block-based Scratch programs, where semantic errors and incomplete tasks thwart novices. It introduces RePurr, the first APR system for Scratch, using evolutionary search guided by a refined fault-localization strategy and augmented by alternative fix sources to bypass the plastic surgery limitation. The approach defines a detailed fitness function that leverages assertion-level information, develops Scratch-specific crossover and mutation operators, and applies parallelized test evaluation to manage long-running Whisker tests. Empirical results on two Scratch datasets show that repair can improve learner programs and yield partial or full fixes, enabling automatic hint generation, though the plastic surgery hypothesis does not hold in general for student code unless diverse fix sources are used. The findings highlight the potential of search-based repair to support automated feedback in classrooms and MOOC-style settings, while outlining avenues for hybridization with data-driven hints and language-model-based strategies to reduce runtime and improve effectiveness.

Abstract

Programming is increasingly taught using block-based languages like Scratch. While the use of blocks prevents syntax errors, learners can still make semantic mistakes, requiring feedback and help. As teachers may be overwhelmed by help requests in a classroom, may lack programming expertise themselves, or may be unavailable in independent learning scenarios, automated hint generation is desirable. Automated program repair (APR) can provide the foundation for this, but relies on multiple assumptions: (1) APR usually targets isolated bugs, but learners may fundamentally misunderstand tasks or request help for substantially incomplete code. (2) Software tests are required to guide the search and localize broken blocks, but tests for block-based programs are different to those in past APR research: They consist of system tests, and very few of them already fully cover the code. At the same time, they have vastly longer runtimes due to animations and interactions on Scratch programs, which inhibits the applicability of search. (3) The plastic surgery hypothesis assumes the code necessary for repairs already exists in the codebase. Block-based programs tend to be small and may lack this redundancy. To study if APR of such programs is still feasible, we introduce, to the best of our knowledge, the first APR approach for Scratch based on evolutionary search. Our RePurr prototype includes novel refinements of fault localization to improve the guidance of test suites, recovers the plastic surgery hypothesis by exploiting that learning scenarios provide model and student solutions, and reduces the costs of fitness evaluations via test parallelization and acceleration. Empirical evaluation on a set of real learners' programs confirms the anticipated challenges, but also demonstrates APR can still effectively improve and fix learners' programs, enabling automated generation of hints and feedback.

Paper Structure

This paper contains 40 sections, 2 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: Deleting expressions creates unoccupied holes
  • Figure 2: Deleting a C block and its nested statements
  • Figure 3: Deleting a C block without its nested blocks
  • Figure 4: Moving the blue stack block to a new location
  • Figure 5: Characteristics of the faulty program variants in the Complex dataset
  • ...and 5 more figures