Table of Contents
Fetching ...

Program Repair

Xiang Gao, Yannic Noller, Abhik Roychoudhury

TL;DR

Automated program repair (APR) aims to automatically fix bugs and vulnerabilities by leveraging test suites as specifications. The paper surveys three core APR families—search-based, constraint-based (semantic), and learning-based—faithfully detailing their workflows, capabilities, and the central challenge of patch overfitting. It discusses methods to mitigate overfitting, including test generation, heuristic ranking, semantic analysis, and human in the loop, and reviews the landscape of tools, industrial deployments, and diverse applications in security, development workflows, and education. Finally, it adresses emerging opportunities with language model based code generation and synergies with testing to enhance trust and adoption in real-world software engineering.

Abstract

Automated program repair is an emerging technology which consists of a suite of techniques to automatically fix bugs or vulnerabilities in programs. In this paper, we present a comprehensive survey of the state of the art in program repair. We first study the different suite of techniques used including search based repair, constraint based repair and learning based repair. We then discuss one of the main challenges in program repair namely patch overfitting, by distilling a class of techniques which can alleviate patch overfitting. We then discuss classes of program repair tools, applications of program repair as well as uses of program repair in industry. We conclude the survey with a forward looking outlook on future usages of program repair, as well as research opportunities arising from work on code from large language models.

Program Repair

TL;DR

Automated program repair (APR) aims to automatically fix bugs and vulnerabilities by leveraging test suites as specifications. The paper surveys three core APR families—search-based, constraint-based (semantic), and learning-based—faithfully detailing their workflows, capabilities, and the central challenge of patch overfitting. It discusses methods to mitigate overfitting, including test generation, heuristic ranking, semantic analysis, and human in the loop, and reviews the landscape of tools, industrial deployments, and diverse applications in security, development workflows, and education. Finally, it adresses emerging opportunities with language model based code generation and synergies with testing to enhance trust and adoption in real-world software engineering.

Abstract

Automated program repair is an emerging technology which consists of a suite of techniques to automatically fix bugs or vulnerabilities in programs. In this paper, we present a comprehensive survey of the state of the art in program repair. We first study the different suite of techniques used including search based repair, constraint based repair and learning based repair. We then discuss one of the main challenges in program repair namely patch overfitting, by distilling a class of techniques which can alleviate patch overfitting. We then discuss classes of program repair tools, applications of program repair as well as uses of program repair in industry. We conclude the survey with a forward looking outlook on future usages of program repair, as well as research opportunities arising from work on code from large language models.
Paper Structure (96 sections, 35 equations, 28 figures, 4 tables, 2 algorithms)

This paper contains 96 sections, 35 equations, 28 figures, 4 tables, 2 algorithms.

Figures (28)

  • Figure 1: Software Development Life-Cycle
  • Figure 2: The left part is a simple example that capitalizes the first letter of a given string, where the bug happens on the if-condition. The correct condition is i == 0. The right part shows two tests and their corresponding expected outputs.
  • Figure 3: Triangle program adapted from cacm19
  • Figure 4: The illustration of the repair constraints.
  • Figure 5: SE-ESOC encoding with four components ("x", "y", "+", "-") and three nodes.
  • ...and 23 more figures

Theorems & Definitions (2)

  • definition 1: Test-equivalence
  • definition 2: Value-based test-equivalence