Peer-aided Repairer: Empowering Large Language Models to Repair Advanced Student Assignments
Qianhui Zhao, Fang Liu, Li Zhang, Yang Liu, Zhen Yan, Zhenghao Chen, Yufei Zhou, Jing Jiang, Ge Li
TL;DR
This work tackles the challenge of repairing advanced student programming assignments by introducing Defects4DS, a dataset with higher complexity than introductory tasks, and a novel framework PaR that leverages large language models. PaR operates through three phases—Peer Solution Selection, Multi-Source Prompt Generation, and Program Repair—using a composite Peer Solution Match score to select semantically and syntactically relevant peer solutions and to craft informative prompts for repair. Empirical evaluation on Defects4DS and ITSP shows state-of-the-art repair performance, with improvements around 19.94% and 15.2% in repair rate over prior LLM- and symbolic-based methods, respectively, across multiple backbone models. The results demonstrate PaR’s robustness, generalization, and educational potential, and the authors release Defects4DS and replication materials to foster future research in advanced programming-education analytics.
Abstract
Automated generation of feedback on programming assignments holds significant benefits for programming education, especially when it comes to advanced assignments. Automated Program Repair techniques, especially Large Language Model based approaches, have gained notable recognition for their potential to fix introductory assignments. However, the programs used for evaluation are relatively simple. It remains unclear how existing approaches perform in repairing programs from higher-level programming courses. To address these limitations, we curate a new advanced student assignment dataset named Defects4DS from a higher-level programming course. Subsequently, we identify the challenges related to fixing bugs in advanced assignments. Based on the analysis, we develop a framework called PaR that is powered by the LLM. PaR works in three phases: Peer Solution Selection, Multi-Source Prompt Generation, and Program Repair. Peer Solution Selection identifies the closely related peer programs based on lexical, semantic, and syntactic criteria. Then Multi-Source Prompt Generation adeptly combines multiple sources of information to create a comprehensive and informative prompt for the last Program Repair stage. The evaluation on Defects4DS and another well-investigated ITSP dataset reveals that PaR achieves a new state-of-the-art performance, demonstrating impressive improvements of 19.94% and 15.2% in repair rate compared to prior state-of-the-art LLM- and symbolic-based approaches, respectively
