Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair

Yuxiang Wei; Chunqiu Steven Xia; Lingming Zhang

Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair

Yuxiang Wei, Chunqiu Steven Xia, Lingming Zhang

TL;DR

Repilot tackles the core shortcoming of token-level LLM-based APR by fusing large-language-model generation with a semantics-aware Completion Engine. The approach treats patch synthesis as a cloze-style repair task and uses the Completion Engine to prune infeasible tokens and proactively complete code, backed by memorization to accelerate search. Empirically, Repilot achieves state-of-the-art bug-fixing performance on Defects4J 1.2 (66 correct fixes) and 2.0 (50 correct fixes), with substantially higher patch compilability than prior methods. The framework is model- and language-agnostic at its core, and evidence shows its generalizability to different LLMs (CodeT5 and InCoder) and bug sets, offering a practical path to more reliable automated program repair and broader code-generation tasks.

Abstract

During Automated Program Repair (APR), it can be challenging to synthesize correct patches for real-world systems in general-purpose programming languages. Recent Large Language Models (LLMs) have been shown to be helpful "copilots" in assisting developers with various coding tasks, and have also been directly applied for patch synthesis. However, most LLMs treat programs as sequences of tokens, meaning that they are ignorant of the underlying semantics constraints of the target programming language. This results in plenty of statically invalid generated patches, impeding the practicality of the technique. Therefore, we propose Repilot, a general code generation framework to further copilot the AI "copilots" (i.e., LLMs) by synthesizing more valid patches during the repair process. Our key insight is that many LLMs produce outputs autoregressively (i.e., token by token), resembling human writing programs, which can be significantly boosted and guided through a Completion Engine. Repilot synergistically synthesizes a candidate patch through the interaction between an LLM and a Completion Engine, which 1) prunes away infeasible tokens suggested by the LLM and 2) proactively completes the token based on the suggestions provided by the Completion Engine. Our evaluation on a subset of the widely-used Defects4j 1.2 and 2.0 datasets shows that Repilot outperforms state-of-the-art techniques by fixing 27% and 47% more bugs, respectively. Moreover, Repilot produces more valid and correct patches than the base LLM with the same budget. While we focus on leveraging Repilot for APR in this work, the overall approach is also generalizable to other code generation tasks.

Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair

TL;DR

Abstract

Paper Structure (32 sections, 4 theorems, 17 equations, 8 figures, 4 tables, 3 algorithms)

This paper contains 32 sections, 4 theorems, 17 equations, 8 figures, 4 tables, 3 algorithms.

Introduction
Background and related work
Large Language Models for Code
Code Completion
Automated Program Repair
Preliminaries
Languages with Static Checking
Abstraction of Completion Engines
Abstraction of LLMs
Approach
Overview
Completion-Guided Search Space Pruning
Memorization for Faster Search
Memorizing rejected tokens
Memorizing accepted tokens
...and 17 more sections

Key Result

lemma 1

The tokens pruned away in algo:prune (Guid-ed-Prune) result in infeasbile programs.

Figures (8)

Figure 1: Limitations of existing LLM-based APR approaches.
Figure 2: Abstraction of a Completion Engine.
Figure 3: Abstraction of encoder-decoder based LLM.
Figure 4: Cloze-style program repair.
Figure 5: Overview of Repilot.
...and 3 more figures

Theorems & Definitions (9)

definition 1: Programming Language with Static Checking
definition 2: Static Feasibility of A Partial Program
definition 3: Completion Engine
definition 4: Strict Completion Engine
definition 5: Large Language Model
lemma 1: Soundness of Pruning
lemma 2: Soundness of Memorization
lemma 3: Soundness of Active Completion
theorem 1: Overall Soundness

Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair

TL;DR

Abstract

Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (9)