INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair

Hanbin Wang; Zhenghao Liu; Shuo Wang; Ganqu Cui; Ning Ding; Zhiyuan Liu; Ge Yu

INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair

Hanbin Wang, Zhenghao Liu, Shuo Wang, Ganqu Cui, Ning Ding, Zhiyuan Liu, Ge Yu

TL;DR

INTERVENOR introduces a two-agent interactive framework that uses compiler feedback to guide iterative code repair, addressing Degeneration-of-Thought in LLMs. The Code Learner and Code Teacher collaboratively generate and repair code through a Chain-of-Repair that includes bug diagnoses and repair plans, executed in an A/B/C turn-based loop. Empirical results show substantial improvements over GPT-3.5 in code generation and translation, with robust ablations underscoring the value of compiler feedback and CoR. The work also introduces the CodeError dataset to benchmark repair capabilities and demonstrates case-based effectiveness in diagnosing and fixing common errors.

Abstract

This paper introduces INTERVENOR (INTERactiVE chaiN Of Repair), a system designed to emulate the interactive code repair processes observed in humans, encompassing both code diagnosis and code repair. INTERVENOR prompts Large Language Models (LLMs) to play distinct roles during the code repair process, functioning as both a Code Learner and a Code Teacher. Specifically, the Code Learner is tasked with adhering to instructions to generate or repair code, while the Code Teacher is responsible for crafting a Chain-of-Repair (CoR) to serve as guidance for the Code Learner. During generating the CoR, the Code Teacher needs to check the generated codes from Code Learner and reassess how to address code bugs based on error feedback received from compilers. Experimental results demonstrate that INTERVENOR surpasses baseline models, exhibiting improvements of approximately 18% and 4.3% over GPT-3.5 in code generation and code translation tasks, respectively. Our further analyses show that CoR is effective to illuminate the reasons behind bugs and outline solution plans in natural language. With the feedback of code compilers, INTERVENOR can accurately identify syntax errors and assertion errors and provide precise instructions to repair codes. All data and codes are available at https://github.com/NEUIR/INTERVENOR

INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair

TL;DR

Abstract

Paper Structure (21 sections, 19 figures, 10 tables)

This paper contains 21 sections, 19 figures, 10 tables.

Introduction
Related Work
Methodology
Preliminary of Code Repair
Interactive Chain-of-Repair (CoR)
Agent Building
Interactive Code Repair Workflow
Experimental Methodology
Evaluation Results
Overall Performance
Ablation Studies
Effectiveness of INTERVENOR in Different Testing Scenarios
Case Studies
Conclusion
Appendix
...and 6 more sections

Figures (19)

Figure 1: The Illustration of INTERVENOR. There are two agents in INTERVENOR, the teacher and student, who collaborate to repair the code. The error messages are utilized as a kind of INTERVENORto alleviate the Degeneration-of-Thought (DoT) problem.
Figure 2: Illustration of Our Interactive Chain-of-Repair Model (INTERVENOR ).
Figure 3: The Impact of Different Code Repair Turns. HumanEval, MBPP, and HumanEval-X (HEX) are used to evaluate our INTERVENOR model.
Figure 4: Code Repair Performance on the CodeError Dataset. We repair the error codes with one single turn. The codes are divided into two groups to evaluate the code repair effectiveness, including Assertion Errors and Others (AttributeError, NameError, RecursionError, SyntacError and TypeError).
Figure 5: Case Studies. We provide two cases that showcase the effectiveness of the Chain-of-Repair (CoR) generated by INTERVENOR when fixing AttributeError and AssertionError, respectively.
...and 14 more figures

INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair

TL;DR

Abstract

INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair

Authors

TL;DR

Abstract

Table of Contents

Figures (19)