Table of Contents
Fetching ...

Peer-aided Repairer: Empowering Large Language Models to Repair Advanced Student Assignments

Qianhui Zhao, Fang Liu, Li Zhang, Yang Liu, Zhen Yan, Zhenghao Chen, Yufei Zhou, Jing Jiang, Ge Li

TL;DR

This work tackles the challenge of repairing advanced student programming assignments by introducing Defects4DS, a dataset with higher complexity than introductory tasks, and a novel framework PaR that leverages large language models. PaR operates through three phases—Peer Solution Selection, Multi-Source Prompt Generation, and Program Repair—using a composite Peer Solution Match score to select semantically and syntactically relevant peer solutions and to craft informative prompts for repair. Empirical evaluation on Defects4DS and ITSP shows state-of-the-art repair performance, with improvements around 19.94% and 15.2% in repair rate over prior LLM- and symbolic-based methods, respectively, across multiple backbone models. The results demonstrate PaR’s robustness, generalization, and educational potential, and the authors release Defects4DS and replication materials to foster future research in advanced programming-education analytics.

Abstract

Automated generation of feedback on programming assignments holds significant benefits for programming education, especially when it comes to advanced assignments. Automated Program Repair techniques, especially Large Language Model based approaches, have gained notable recognition for their potential to fix introductory assignments. However, the programs used for evaluation are relatively simple. It remains unclear how existing approaches perform in repairing programs from higher-level programming courses. To address these limitations, we curate a new advanced student assignment dataset named Defects4DS from a higher-level programming course. Subsequently, we identify the challenges related to fixing bugs in advanced assignments. Based on the analysis, we develop a framework called PaR that is powered by the LLM. PaR works in three phases: Peer Solution Selection, Multi-Source Prompt Generation, and Program Repair. Peer Solution Selection identifies the closely related peer programs based on lexical, semantic, and syntactic criteria. Then Multi-Source Prompt Generation adeptly combines multiple sources of information to create a comprehensive and informative prompt for the last Program Repair stage. The evaluation on Defects4DS and another well-investigated ITSP dataset reveals that PaR achieves a new state-of-the-art performance, demonstrating impressive improvements of 19.94% and 15.2% in repair rate compared to prior state-of-the-art LLM- and symbolic-based approaches, respectively

Peer-aided Repairer: Empowering Large Language Models to Repair Advanced Student Assignments

TL;DR

This work tackles the challenge of repairing advanced student programming assignments by introducing Defects4DS, a dataset with higher complexity than introductory tasks, and a novel framework PaR that leverages large language models. PaR operates through three phases—Peer Solution Selection, Multi-Source Prompt Generation, and Program Repair—using a composite Peer Solution Match score to select semantically and syntactically relevant peer solutions and to craft informative prompts for repair. Empirical evaluation on Defects4DS and ITSP shows state-of-the-art repair performance, with improvements around 19.94% and 15.2% in repair rate over prior LLM- and symbolic-based methods, respectively, across multiple backbone models. The results demonstrate PaR’s robustness, generalization, and educational potential, and the authors release Defects4DS and replication materials to foster future research in advanced programming-education analytics.

Abstract

Automated generation of feedback on programming assignments holds significant benefits for programming education, especially when it comes to advanced assignments. Automated Program Repair techniques, especially Large Language Model based approaches, have gained notable recognition for their potential to fix introductory assignments. However, the programs used for evaluation are relatively simple. It remains unclear how existing approaches perform in repairing programs from higher-level programming courses. To address these limitations, we curate a new advanced student assignment dataset named Defects4DS from a higher-level programming course. Subsequently, we identify the challenges related to fixing bugs in advanced assignments. Based on the analysis, we develop a framework called PaR that is powered by the LLM. PaR works in three phases: Peer Solution Selection, Multi-Source Prompt Generation, and Program Repair. Peer Solution Selection identifies the closely related peer programs based on lexical, semantic, and syntactic criteria. Then Multi-Source Prompt Generation adeptly combines multiple sources of information to create a comprehensive and informative prompt for the last Program Repair stage. The evaluation on Defects4DS and another well-investigated ITSP dataset reveals that PaR achieves a new state-of-the-art performance, demonstrating impressive improvements of 19.94% and 15.2% in repair rate compared to prior state-of-the-art LLM- and symbolic-based approaches, respectively
Paper Structure (36 sections, 3 equations, 10 figures, 6 tables)

This paper contains 36 sections, 3 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Bug type distribution of Defects4DS. A single bug sometimes corresponds to multiple bug types.
  • Figure 2: Distribution of bug types and repair types for Prob. 1-4.
  • Figure 3: Comparison of the distribution of repair types and bug types in the annotated results between Defects4DS and ITSP. To provide a more visually intuitive representation of the bug types, we utilize the same color scheme for different subcategories within the same main category.
  • Figure 4: The architecture of PaR.
  • Figure 5: Example of the additional part of the prompt when considered the bug-related information. We use $\mathit{l}_{i}, \mathit{bt}_{i}$ and $\mathit{rt}_{i} \mathit{(i=1,2,3)}$ respectively to represent bug lines, bug types, and repair types. The texts in Bug Line is utilized in PaR w/BL in RQ1, while the remaining information is only used in RQ2.
  • ...and 5 more figures