RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair

André Silva; Sen Fang; Martin Monperrus

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair

André Silva, Sen Fang, Martin Monperrus

TL;DR

RepairLLaMA introduces a focused pipeline for automated program repair that combines APR-specific code representations with parameter-efficient fine-tuning via LoRA adapters. The approach addresses both representation and training efficiency, enabling a smaller model to outperform baselines and even GPT-4 on several Java benchmarks. Empirical results across Defects4J v2, HumanEval-Java, and GitBug-Java demonstrate strong patch quality (plausible, exact, AST, and semantic) and robust performance in multi-location bug contexts. The work highlights the importance of domain-specific representations and PEFT for scalable, high-quality APR, and it provides open-source tooling to foster reproducibility.

Abstract

Automated Program Repair (APR) has evolved significantly with the advent of Large Language Models (LLMs). Fine-tuning LLMs for program repair is a recent avenue of research, with many dimensions which have not been explored. Existing work mostly fine-tune LLMs with naive code representations and does not scale to frontier models. To address this problem, we propose RepairLLaMA, a novel program repair approach that 1) identifies optimal code representations for APR with fine-tuned models, and 2) pioneers state-of-the-art parameter-efficient fine-tuning technique (PEFT) for program repair. This results in RepairLLaMA producing a highly effective `program repair adapter' for fixing bugs with AI. Our experiments demonstrate the validity of both concepts. First, fine-tuning adapters with program repair specific code representations enables the model to use meaningful repair signals and produce better patches. Second, parameter-efficient fine-tuning helps fine-tuning to converge and clearly contributes to the effectiveness of RepairLLaMA in fixing bugs outside the fine-tuning data distribution. Overall, RepairLLaMA correctly fixes 144 Defects4J v2, 109 HumanEval-Java, and 20 GitBug-Java bugs, outperforming all baselines.

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair

TL;DR

Abstract

Paper Structure (36 sections, 6 figures, 4 tables)

This paper contains 36 sections, 6 figures, 4 tables.

Introduction
RepairLLaMA: Efficient Fine-Tuning for Program Repair
Overview
Target Bugs
Choice of the Initial LLM
Choice of Code Representations
Representation of Fault Localization
Input Representation Space
Output Representation Space
Input/Output Representation Pairs
Choice of Fine-Tuning Dataset
Program Repair Adapters for LLMs
Inference Time
Experimental Methodology
Research Questions
...and 21 more sections

Figures (6)

Figure 1: Overview of RepairLLaMA. The core novelties of RepairLLaMA are the APR specific code representations and the engineering of an effective program repair adapter that is plugged into the underlying LLM.
Figure 2: Buggy code of the multi-location bug Chart-5 represented in our four different input representations.
Figure 3: Patch for multi-location bug Chart-5 represented in our four different output representations.
Figure 4: The prompt used to prompt GPT-3.5 and GPT-4 as a strong baseline to generate patches.
Figure 5: Exact match patch generated by RepairLLaMA for Math-86 from Defects4J v2. In this multi-location bug, RepairLLaMA is able to fix two distant buggy locations.
...and 1 more figures

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair

TL;DR

Abstract

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair

Authors

TL;DR

Abstract

Table of Contents

Figures (6)