Exploring Parameter-Efficient Fine-Tuning of Large Language Model on Automated Program Repair

Guochang Li; Chen Zhi; Jialiang Chen; Junxiao Han; Shuiguang Deng

Exploring Parameter-Efficient Fine-Tuning of Large Language Model on Automated Program Repair

Guochang Li, Chen Zhi, Jialiang Chen, Junxiao Han, Shuiguang Deng

TL;DR

This work enhances the existing APR dataset by employing prompt engineering to create an instruction dataset, APR-Instruction, and fine-tune four pre-trained LLMs using four different PEFT methods with APR-Instruction to provide a comprehensive exploration of Parameter-Efficient Fine-Tuning on APR.

Abstract

Automated Program Repair (APR) aims to fix bugs by generating patches. And existing work has demonstrated that "pre-training and fine-tuning" paradigm enables Large Language Models (LLMs) improve fixing capabilities on APR. However, existing work mainly focuses on Full-Model Fine-Tuning (FMFT) for APR and limited research has been conducted on the execution-based evaluation of Parameter-Efficient Fine-Tuning (PEFT) for APR. Comparing to FMFT, PEFT can reduce computing resource consumption without compromising performance and has been widely adopted to other software engineering tasks. To fill this gap, we enhance the existing APR dataset by employing prompt engineering to create an instruction dataset, APR-INSTRUCTION, at first. Secondly, we fine-tune four pre-trained LLMs using four different PEFT methods with APR-INSTRUCTION. The best fine-tuned model fixes 58% more bugs than the state-of-the-art LLM-based APR techniques. The results also show that $(IA)^3$ improves the creativity of LLMs more effectively through fine-tuning and achieves the highest fixing capability compared to the other three PEFT methods. Thirdly, we explore the optimal configuration of PEFT hyperparameters, and assess the impact of instruction dataset size, showing that a larger number of parameters and a larger training dataset do not necessarily result in better performance for PEFT. Lastly, we analyze peak memory usage and trainable parameters to show the efficiency of PEFT. This work provides a comprehensive exploration of PEFT on APR and suggests potentially promising directions for extension to other software engineering downstream tasks. APR-INSTRUCTION, PEFT weights, and the fine-tuning code are publicly available as open-source resources.

Exploring Parameter-Efficient Fine-Tuning of Large Language Model on Automated Program Repair

TL;DR

Abstract

improves the creativity of LLMs more effectively through fine-tuning and achieves the highest fixing capability compared to the other three PEFT methods. Thirdly, we explore the optimal configuration of PEFT hyperparameters, and assess the impact of instruction dataset size, showing that a larger number of parameters and a larger training dataset do not necessarily result in better performance for PEFT. Lastly, we analyze peak memory usage and trainable parameters to show the efficiency of PEFT. This work provides a comprehensive exploration of PEFT on APR and suggests potentially promising directions for extension to other software engineering downstream tasks. APR-INSTRUCTION, PEFT weights, and the fine-tuning code are publicly available as open-source resources.

Paper Structure (33 sections, 1 equation, 7 figures, 8 tables)

This paper contains 33 sections, 1 equation, 7 figures, 8 tables.

Introduction
Experiment Design
Overview
Studied Large Language Models
APR-Instruction Construction
Data Source
Data Construction
Impact of APR-Instruction
Evaluation of overlap between APR-Instruction and benchmarks
Evaluation of APR-Instruction validity
PEFT Methods and Implementation Details
Benchmarks and Evaluation Metrics of APR
Three existing benchmarks
Patch generation and validation
Implementation Details
...and 18 more sections

Figures (7)

Figure 1: An overview of APR-Instruction constructed, supervised fine-tuning on four LLMs with four PEFT Methods, and Evaluation on three benchmarks
Figure 2: Principles of LoRA, p-tuning, prefix-tuning, and $(IA)^3$ on LLMs. Red parts represents trainable modules of PEFT methods.
Figure 3: Infilling template of CodeLlama-7B
Figure 4: Efficiency of memory and parameters with different fine-tuning methods on CodeLlama-7B
Figure 5: $pass@k$ on Defects4J with different PEFT methods
...and 2 more figures

Exploring Parameter-Efficient Fine-Tuning of Large Language Model on Automated Program Repair

TL;DR

Abstract

Exploring Parameter-Efficient Fine-Tuning of Large Language Model on Automated Program Repair

Authors

TL;DR

Abstract

Table of Contents

Figures (7)