Table of Contents
Fetching ...

ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs

Jiaolong Kong, Mingfei Cheng, Xiaofei Xie, Shangqing Liu, Xiaoning Du, Qi Guo

TL;DR

ContrastRepair introduces a contrastive-feedback mechanism for conversation-driven automated program repair by pairing each failing test with a minimally different passing test. This contrastive pair, along with traceback context and dependent functions, guides an LLM to generate more precise patches through iterative prompts. Evaluations across Defects4J, QuixBugs, and HumanEval-Java show state-of-the-art repair performance and improved efficiency in API usage compared to prior ChatGPT-based approaches. The work highlights the value of test-pair selection, similarity-based prompting, and contextual information in enhancing bug localization and patch quality, offering a practical path toward more reliable APR with LLMs.

Abstract

Automated Program Repair (APR) aims to automatically generate patches for rectifying software bugs. Recent strides in Large Language Models (LLM), such as ChatGPT, have yielded encouraging outcomes in APR, especially within the conversation-driven APR framework. Nevertheless, the efficacy of conversation-driven APR is contingent on the quality of the feedback information. In this paper, we propose ContrastRepair, a novel conversation-based APR approach that augments conversation-driven APR by providing LLMs with contrastive test pairs. A test pair consists of a failing test and a passing test, which offer contrastive feedback to the LLM. Our key insight is to minimize the difference between the generated passing test and the given failing test, which can better isolate the root causes of bugs. By providing informative and specific feedback, ContrastRepair enables the LLM to produce effective bug fixes. The implementation of ContrastRepair is based on the state-of-the-art LLM, ChatGPT, and it iteratively interacts with ChatGPT until plausible patches are generated. We evaluate ContrastRepair on multiple benchmark datasets, including Defects4j, QuixBugs, and HumanEval-Java. The results demonstrate that ContrastRepair significantly outperforms existing methods, achieving a new state-of-the-art in program repair. For instance, among Defects4j 1.2 and 2.0, ContrastRepair correctly repairs 143 out of all 337 bug cases, while the best-performing baseline fixes 124 bugs.

ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs

TL;DR

ContrastRepair introduces a contrastive-feedback mechanism for conversation-driven automated program repair by pairing each failing test with a minimally different passing test. This contrastive pair, along with traceback context and dependent functions, guides an LLM to generate more precise patches through iterative prompts. Evaluations across Defects4J, QuixBugs, and HumanEval-Java show state-of-the-art repair performance and improved efficiency in API usage compared to prior ChatGPT-based approaches. The work highlights the value of test-pair selection, similarity-based prompting, and contextual information in enhancing bug localization and patch quality, offering a practical path toward more reliable APR with LLMs.

Abstract

Automated Program Repair (APR) aims to automatically generate patches for rectifying software bugs. Recent strides in Large Language Models (LLM), such as ChatGPT, have yielded encouraging outcomes in APR, especially within the conversation-driven APR framework. Nevertheless, the efficacy of conversation-driven APR is contingent on the quality of the feedback information. In this paper, we propose ContrastRepair, a novel conversation-based APR approach that augments conversation-driven APR by providing LLMs with contrastive test pairs. A test pair consists of a failing test and a passing test, which offer contrastive feedback to the LLM. Our key insight is to minimize the difference between the generated passing test and the given failing test, which can better isolate the root causes of bugs. By providing informative and specific feedback, ContrastRepair enables the LLM to produce effective bug fixes. The implementation of ContrastRepair is based on the state-of-the-art LLM, ChatGPT, and it iteratively interacts with ChatGPT until plausible patches are generated. We evaluate ContrastRepair on multiple benchmark datasets, including Defects4j, QuixBugs, and HumanEval-Java. The results demonstrate that ContrastRepair significantly outperforms existing methods, achieving a new state-of-the-art in program repair. For instance, among Defects4j 1.2 and 2.0, ContrastRepair correctly repairs 143 out of all 337 bug cases, while the best-performing baseline fixes 124 bugs.
Paper Structure (28 sections, 6 figures, 10 tables, 1 algorithm)

This paper contains 28 sections, 6 figures, 10 tables, 1 algorithm.

Figures (6)

  • Figure 1: A Motivating Example
  • Figure 2: Overview of ContrastRepair
  • Figure 3: Illustration of Prompt Construction
  • Figure 4: An illustrative example of a bug uniquely fixed by ContrastRepair in Defects4J 1.2.
  • Figure 5: Bug fix Venn diagram on D4J1.2.
  • ...and 1 more figures