Table of Contents
Fetching ...

The Right Prompts for the Job: Repair Code-Review Defects with Large Language Model

Zelin Zhao, Zhaogui Xu, Jialong Zhu, Peng Di, Yuan Yao, Xiaoxing Ma

TL;DR

Experimental results demonstrate a remarkable repair rate of 72.97% with the best prompt, highlighting a substantial improvement in the effectiveness and practicality of automatic repair techniques.

Abstract

Automatic program repair (APR) techniques have the potential to reduce manual efforts in uncovering and repairing program defects during the code review (CR) process. However, the limited accuracy and considerable time costs associated with existing APR approaches hinder their adoption in industrial practice. One key factor is the under-utilization of review comments, which provide valuable insights into defects and potential fixes. Recent advancements in Large Language Models (LLMs) have enhanced their ability to comprehend natural and programming languages, enabling them to generate patches based on review comments. This paper conducts a comprehensive investigation into the effective utilization of LLMs for repairing CR defects. In this study, various prompts are designed and compared across mainstream LLMs using two distinct datasets from human reviewers and automated checkers. Experimental results demonstrate a remarkable repair rate of 72.97% with the best prompt, highlighting a substantial improvement in the effectiveness and practicality of automatic repair techniques.

The Right Prompts for the Job: Repair Code-Review Defects with Large Language Model

TL;DR

Experimental results demonstrate a remarkable repair rate of 72.97% with the best prompt, highlighting a substantial improvement in the effectiveness and practicality of automatic repair techniques.

Abstract

Automatic program repair (APR) techniques have the potential to reduce manual efforts in uncovering and repairing program defects during the code review (CR) process. However, the limited accuracy and considerable time costs associated with existing APR approaches hinder their adoption in industrial practice. One key factor is the under-utilization of review comments, which provide valuable insights into defects and potential fixes. Recent advancements in Large Language Models (LLMs) have enhanced their ability to comprehend natural and programming languages, enabling them to generate patches based on review comments. This paper conducts a comprehensive investigation into the effective utilization of LLMs for repairing CR defects. In this study, various prompts are designed and compared across mainstream LLMs using two distinct datasets from human reviewers and automated checkers. Experimental results demonstrate a remarkable repair rate of 72.97% with the best prompt, highlighting a substantial improvement in the effectiveness and practicality of automatic repair techniques.
Paper Structure (26 sections, 8 figures, 5 tables)

This paper contains 26 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: The code review process in industrial practice. Red lines denote the new steps and components to cooperate with CLM.
  • Figure 2: CR defects identified by reviewers.
  • Figure 3: CR defects identified by automated checker, PMD.
  • Figure 4: The template of prompts in this paper.
  • Figure 5: A buggy code snippet that was only fixed by Code-LLaMA with P7. [FIX_START] is before line 2 and [FIX_END] is after line 10, they are omitted for simplicity.
  • ...and 3 more figures