Table of Contents
Fetching ...

ThinkRepair: Self-Directed Automated Program Repair

Xin Yin, Chao Ni, Shaohua Wang, Zhenhao Li, Limin Zeng, Xiaohu Yang

TL;DR

ThinkRepair introduces a self-directed, two-phase framework for automated program repair that builds a chains-of-thought knowledge pool via an automated collection phase and then fixes bugs through a fixing phase that uses few-shot examples and interactive failure feedback with LLMs. The approach significantly improves repair effectiveness over both NMT-based and prior LLM-based APRs on Defects4J and QuixBugs, and maintains practical efficiency by limiting patches per bug. Extensive analyses show that the method benefits from CoT reasoning, strategic example selection, and iterative test-driven refinement, while experiments on real-world bug data indicate practical applicability beyond standard benchmarks. Overall, ThinkRepair demonstrates strong reasoning-backed repair capabilities with a flexible, LLM-agnostic design and robust performance gains across multiple datasets and backends.

Abstract

Though many approaches have been proposed for Automated Program Repair (APR) and indeed achieved remarkable performance, they still have limitations in fixing bugs that require analyzing and reasoning about the logic of the buggy program. Recently, large language models (LLMs) instructed by prompt engineering have attracted much attention for their powerful ability to address many kinds of tasks including bug-fixing. However, the quality of the prompt will highly affect the ability of LLMs and manually constructing high-quality prompts is a costly endeavor. To address this limitation, we propose a self-directed LLM-based automated program repair, ThinkRepair, with two main phases: collection phase and fixing phase. The former phase automatically collects various chains of thoughts that constitute pre-fixed knowledge by instructing LLMs with the Chain-of-Thought (CoT) prompt. The latter phase targets fixing a bug by first selecting examples for few-shot learning and second automatically interacting with LLMs, optionally appending with feedback of testing information. Evaluations on two widely studied datasets (Defects4J and QuixBugs) by comparing ThinkRepair with 12 SOTA APRs indicate the priority of ThinkRepair in fixing bugs. Notably, ThinkRepair fixes 98 bugs and improves baselines by 27%-344.4% on Defects4J V1.2. On Defects4J V2.0, ThinkRepair fixes 12-65 more bugs than the SOTA APRs. Additionally, ThinkRepair also makes a considerable improvement on QuixBugs (31 for Java and 21 for Python at most).

ThinkRepair: Self-Directed Automated Program Repair

TL;DR

ThinkRepair introduces a self-directed, two-phase framework for automated program repair that builds a chains-of-thought knowledge pool via an automated collection phase and then fixes bugs through a fixing phase that uses few-shot examples and interactive failure feedback with LLMs. The approach significantly improves repair effectiveness over both NMT-based and prior LLM-based APRs on Defects4J and QuixBugs, and maintains practical efficiency by limiting patches per bug. Extensive analyses show that the method benefits from CoT reasoning, strategic example selection, and iterative test-driven refinement, while experiments on real-world bug data indicate practical applicability beyond standard benchmarks. Overall, ThinkRepair demonstrates strong reasoning-backed repair capabilities with a flexible, LLM-agnostic design and robust performance gains across multiple datasets and backends.

Abstract

Though many approaches have been proposed for Automated Program Repair (APR) and indeed achieved remarkable performance, they still have limitations in fixing bugs that require analyzing and reasoning about the logic of the buggy program. Recently, large language models (LLMs) instructed by prompt engineering have attracted much attention for their powerful ability to address many kinds of tasks including bug-fixing. However, the quality of the prompt will highly affect the ability of LLMs and manually constructing high-quality prompts is a costly endeavor. To address this limitation, we propose a self-directed LLM-based automated program repair, ThinkRepair, with two main phases: collection phase and fixing phase. The former phase automatically collects various chains of thoughts that constitute pre-fixed knowledge by instructing LLMs with the Chain-of-Thought (CoT) prompt. The latter phase targets fixing a bug by first selecting examples for few-shot learning and second automatically interacting with LLMs, optionally appending with feedback of testing information. Evaluations on two widely studied datasets (Defects4J and QuixBugs) by comparing ThinkRepair with 12 SOTA APRs indicate the priority of ThinkRepair in fixing bugs. Notably, ThinkRepair fixes 98 bugs and improves baselines by 27%-344.4% on Defects4J V1.2. On Defects4J V2.0, ThinkRepair fixes 12-65 more bugs than the SOTA APRs. Additionally, ThinkRepair also makes a considerable improvement on QuixBugs (31 for Java and 21 for Python at most).
Paper Structure (31 sections, 9 figures, 10 tables, 1 algorithm)

This paper contains 31 sections, 9 figures, 10 tables, 1 algorithm.

Figures (9)

  • Figure 1: https://storage.googleapis.com/google-code-archive/v2/code.google.com/closure-compiler/issues/issue-538.json: a code logic error in Closure project
  • Figure 2: Overview of ThinkRepair
  • Figure 3: The Process of the Collection Phase
  • Figure 4: The Process of the Fixing Phase
  • Figure 5: RQ1: Bug-fixing Venn diagram on Defects4J V1.2 of ThinkRepair, BaseChatGPT and AlphaRepair
  • ...and 4 more figures