Table of Contents
Fetching ...

A Survey of Learning-based Automated Program Repair

Quanjun Zhang, Chunrong Fang, Yuxiang Ma, Weisong Sun, Zhenyu Chen

TL;DR

This survey reviews learning-based automated program repair (APR), framing patch generation as neural machine translation and detailing the full DL-based repair workflow from fault localization to patch correctness. It synthesizes datasets, metrics, empirical studies, and industrial deployments, and discusses state-of-the-art techniques across semantic, syntax, and security repair domains. The authors provide practical guidelines—covering code representations, patch validation, and pre-trained models—and outline open science challenges to guide future research and real-world adoption. The work highlights the rapid diffusion of pre-trained models, multi-language repair, and the integration of DL with traditional APR, while stressing the need for robust patch correctness assessment and reproducibility.

Abstract

Automated program repair (APR) aims to fix software bugs automatically and plays a crucial role in software development and maintenance. With the recent advances in deep learning (DL), an increasing number of APR techniques have been proposed to leverage neural networks to learn bug-fixing patterns from massive open-source code repositories. Such learning-based techniques usually treat APR as a neural machine translation (NMT) task, where buggy code snippets (i.e., source language) are translated into fixed code snippets (i.e., target language) automatically. Benefiting from the powerful capability of DL to learn hidden relationships from previous bug-fixing datasets, learning-based APR techniques have achieved remarkable performance. In this paper, we provide a systematic survey to summarize the current state-of-the-art research in the learning-based APR community. We illustrate the general workflow of learning-based APR techniques and detail the crucial components, including fault localization, patch generation, patch ranking, patch validation, and patch correctness phases. We then discuss the widely-adopted datasets and evaluation metrics and outline existing empirical studies. We discuss several critical aspects of learning-based APR techniques, such as repair domains, industrial deployment, and the open science issue. We highlight several practical guidelines on applying DL techniques for future APR studies, such as exploring explainable patch generation and utilizing code features. Overall, our paper can help researchers gain a comprehensive understanding about the achievements of the existing learning-based APR techniques and promote the practical application of these techniques. Our artifacts are publicly available at \url{https://github.com/QuanjunZhang/AwesomeLearningAPR}.

A Survey of Learning-based Automated Program Repair

TL;DR

This survey reviews learning-based automated program repair (APR), framing patch generation as neural machine translation and detailing the full DL-based repair workflow from fault localization to patch correctness. It synthesizes datasets, metrics, empirical studies, and industrial deployments, and discusses state-of-the-art techniques across semantic, syntax, and security repair domains. The authors provide practical guidelines—covering code representations, patch validation, and pre-trained models—and outline open science challenges to guide future research and real-world adoption. The work highlights the rapid diffusion of pre-trained models, multi-language repair, and the integration of DL with traditional APR, while stressing the need for robust patch correctness assessment and reproducibility.

Abstract

Automated program repair (APR) aims to fix software bugs automatically and plays a crucial role in software development and maintenance. With the recent advances in deep learning (DL), an increasing number of APR techniques have been proposed to leverage neural networks to learn bug-fixing patterns from massive open-source code repositories. Such learning-based techniques usually treat APR as a neural machine translation (NMT) task, where buggy code snippets (i.e., source language) are translated into fixed code snippets (i.e., target language) automatically. Benefiting from the powerful capability of DL to learn hidden relationships from previous bug-fixing datasets, learning-based APR techniques have achieved remarkable performance. In this paper, we provide a systematic survey to summarize the current state-of-the-art research in the learning-based APR community. We illustrate the general workflow of learning-based APR techniques and detail the crucial components, including fault localization, patch generation, patch ranking, patch validation, and patch correctness phases. We then discuss the widely-adopted datasets and evaluation metrics and outline existing empirical studies. We discuss several critical aspects of learning-based APR techniques, such as repair domains, industrial deployment, and the open science issue. We highlight several practical guidelines on applying DL techniques for future APR studies, such as exploring explainable patch generation and utilizing code features. Overall, our paper can help researchers gain a comprehensive understanding about the achievements of the existing learning-based APR techniques and promote the practical application of these techniques. Our artifacts are publicly available at \url{https://github.com/QuanjunZhang/AwesomeLearningAPR}.
Paper Structure (43 sections, 6 figures, 5 tables)

This paper contains 43 sections, 6 figures, 5 tables.

Figures (6)

  • Figure 1: General workflow of the paper collection
  • Figure 2: Collected learning-based APR papers from 2016 to 2022
  • Figure 3: Paper distribution on programming languages
  • Figure 4: Overview of APR
  • Figure 5: Detailed workflow of Learning-based APR
  • ...and 1 more figures

Theorems & Definitions (2)

  • definition 1
  • definition 2