Program Repair

Xiang Gao; Yannic Noller; Abhik Roychoudhury

Program Repair

Xiang Gao, Yannic Noller, Abhik Roychoudhury

TL;DR

Automated program repair (APR) aims to automatically fix bugs and vulnerabilities by leveraging test suites as specifications. The paper surveys three core APR families—search-based, constraint-based (semantic), and learning-based—faithfully detailing their workflows, capabilities, and the central challenge of patch overfitting. It discusses methods to mitigate overfitting, including test generation, heuristic ranking, semantic analysis, and human in the loop, and reviews the landscape of tools, industrial deployments, and diverse applications in security, development workflows, and education. Finally, it adresses emerging opportunities with language model based code generation and synergies with testing to enhance trust and adoption in real-world software engineering.

Abstract

Automated program repair is an emerging technology which consists of a suite of techniques to automatically fix bugs or vulnerabilities in programs. In this paper, we present a comprehensive survey of the state of the art in program repair. We first study the different suite of techniques used including search based repair, constraint based repair and learning based repair. We then discuss one of the main challenges in program repair namely patch overfitting, by distilling a class of techniques which can alleviate patch overfitting. We then discuss classes of program repair tools, applications of program repair as well as uses of program repair in industry. We conclude the survey with a forward looking outlook on future usages of program repair, as well as research opportunities arising from work on code from large language models.

Program Repair

TL;DR

Abstract

Paper Structure (96 sections, 35 equations, 28 figures, 4 tables, 2 algorithms)

This paper contains 96 sections, 35 equations, 28 figures, 4 tables, 2 algorithms.

Introduction
Program Repair in a Nutshell
Supporting Software Evolution
Challenges
Applications
Organization
Existing overview articles
Search-Based Program Repair
Basic Search-Based Repair Workflow
Search Space Exploration
Genetic Programming
Mutation
Crossover
Fitness
Pattern-Based Search
...and 81 more sections

Figures (28)

Figure 1: Software Development Life-Cycle
Figure 2: The left part is a simple example that capitalizes the first letter of a given string, where the bug happens on the if-condition. The correct condition is i == 0. The right part shows two tests and their corresponding expected outputs.
Figure 3: Triangle program adapted from cacm19
Figure 4: The illustration of the repair constraints.
Figure 5: SE-ESOC encoding with four components ("x", "y", "+", "-") and three nodes.
...and 23 more figures

Theorems & Definitions (2)

definition 1: Test-equivalence
definition 2: Value-based test-equivalence

Program Repair

TL;DR

Abstract

Program Repair

Authors

TL;DR

Abstract

Table of Contents

Figures (28)

Theorems & Definitions (2)