RePair: Automated Program Repair with Process-based Feedback

Yuze Zhao; Zhenya Huang; Yixiao Ma; Rui Li; Kai Zhang; Hao Jiang; Qi Liu; Linbo Zhu; Yu Su

RePair: Automated Program Repair with Process-based Feedback

Yuze Zhao, Zhenya Huang, Yixiao Ma, Rui Li, Kai Zhang, Hao Jiang, Qi Liu, Linbo Zhu, Yu Su

TL;DR

A reward model is developed that serves as a critic, providing feedback for the fine-tuned LM's action, progressively optimizing its policy, and the results show that process-based not only outperforms larger outcome-based generation methods, but also nearly matches the performance of closed-source commercial large-scale LMs.

Abstract

The gap between the trepidation of program reliability and the expense of repairs underscores the indispensability of Automated Program Repair (APR). APR is instrumental in transforming vulnerable programs into more robust ones, bolstering program reliability while simultaneously diminishing the financial burden of manual repairs. Commercial-scale language models (LM) have taken APR to unprecedented levels. However, the emergence reveals that for models fewer than 100B parameters, making single-step modifications may be difficult to achieve the desired effect. Moreover, humans interact with the LM through explicit prompts, which hinders the LM from receiving feedback from compiler and test cases to automatically optimize its repair policies. In this literature, we explore how small-scale LM (less than 20B) achieve excellent performance through process supervision and feedback. We start by constructing a dataset named CodeNet4Repair, replete with multiple repair records, which supervises the fine-tuning of a foundational model. Building upon the encouraging outcomes of reinforcement learning, we develop a reward model that serves as a critic, providing feedback for the fine-tuned LM's action, progressively optimizing its policy. During inference, we require the LM to generate solutions iteratively until the repair effect no longer improves or hits the maximum step limit. The results show that process-based not only outperforms larger outcome-based generation methods, but also nearly matches the performance of closed-source commercial large-scale LMs.

RePair: Automated Program Repair with Process-based Feedback

TL;DR

Abstract

Paper Structure (32 sections, 5 equations, 13 figures, 4 tables, 1 algorithm)

This paper contains 32 sections, 5 equations, 13 figures, 4 tables, 1 algorithm.

Introduction
Data Collection
Problem Description Collection
Program Preliminary Filtering
Program Fine Filtering
Method
Supervised Fine Tuning on APR Task
Process-based Feedback
Reward Modeling
Reinforcement Learning
Multi-step Generation Under RM Supervision
Experiments
Data Preparation
Experimental Setup
Baselines
...and 17 more sections

Figures (13)

Figure 1: The general procedure for competitors to refine a solution on programming contest platform. Initially, they draft a solution based on the problem description and additional constraints such as time and memory limits. They then progressively improve their solution using feedback from the platform, like exceeding time or memory limits, until they achieve an accepted result.
Figure 2: An illustration of process-based Automated Program Repair with compiler and test case feedback: (1) The introduction of a clean, privacy-protected dataset called CodeNet4Repair. (2) The application of Supervised Fine-Tuning (SFT) on pre-trained language models. (3) The incorporation of process-based feedback via reinforcement learning (RL). This process includes: establishing a reward model as a critic, and LM adjusts its repair policies based on the feedback from the critic. SFT and RL are both trained on CodeNet4Repair training set.
Figure 3: (a) The edit distances between three program after 2-step modifications. 0-1: The edit distance after the first refinement; 1-2: The edit distance after the second refinement; 0-2: The edit distance of a single-step refinement. (b) The distribution of test cases. Most of the test cases are concentrated between 10-20.
Figure 4: Performance comparison on other fine-tuned open-sourced model.
Figure 5: Performance of explicit prompts at different training steps. At three difficulty levels, the best performance was achieved at three training steps. ChatGPT can effectively understand feedback from the compiler and test cases. For small-scale models, explicit prompts are still difficult to understand.
...and 8 more figures

RePair: Automated Program Repair with Process-based Feedback

TL;DR

Abstract

RePair: Automated Program Repair with Process-based Feedback

Authors

TL;DR

Abstract

Table of Contents

Figures (13)