Table of Contents
Fetching ...

A Survey of Bugs in AI-Generated Code

Ruofan Gao, Amjed Tahir, Peng Liang, Teo Susnjak, Foutse Khomh

TL;DR

This systematic literature review consolidates scattered findings on bugs in AI-generated code, delivering a comprehensive bug taxonomy and analyzing how bug types distribute across model families. It shows functional and syntax issues are prevalent, with semantic/logic errors and model hallucinations posing major challenges, and highlights how model size, training data, and prompting influence bug patterns. The study surveys mitigation approaches ranging from prompt engineering and code-enhancement modules to autonomous coding agents and program-analysis-based repairs, arguing for hybrid, end-to-end workflows. It also identifies gaps such asHallucination attribution and the need for robust benchmarks, urging continued development of benchmarks, targeted training, and cross-domain evaluation to improve reliability in AI-assisted software development.

Abstract

Developers are widely using AI code-generation models, aiming to increase productivity and efficiency. However, there are also quality concerns regarding the AI-generated code. The generated code is produced by models trained on publicly available code, which are known to contain bugs and quality issues. Those issues can cause trust and maintenance challenges during the development process. Several quality issues associated with AI-generated code have been reported, including bugs and defects. However, these findings are often scattered and lack a systematic summary. A comprehensive review is currently lacking to reveal the types and distribution of these errors, possible remediation strategies, as well as their correlation with the specific models. In this paper, we systematically analyze the existing AI-generated code literature to establish an overall understanding of bugs and defects in generated code, providing a reference for future model improvement and quality assessment. We aim to understand the nature and extent of bugs in AI-generated code, and provide a classification of bug types and patterns present in code generated by different models. We also discuss possible fixes and mitigation strategies adopted to eliminate bugs from the generated code.

A Survey of Bugs in AI-Generated Code

TL;DR

This systematic literature review consolidates scattered findings on bugs in AI-generated code, delivering a comprehensive bug taxonomy and analyzing how bug types distribute across model families. It shows functional and syntax issues are prevalent, with semantic/logic errors and model hallucinations posing major challenges, and highlights how model size, training data, and prompting influence bug patterns. The study surveys mitigation approaches ranging from prompt engineering and code-enhancement modules to autonomous coding agents and program-analysis-based repairs, arguing for hybrid, end-to-end workflows. It also identifies gaps such asHallucination attribution and the need for robust benchmarks, urging continued development of benchmarks, targeted training, and cross-domain evaluation to improve reliability in AI-assisted software development.

Abstract

Developers are widely using AI code-generation models, aiming to increase productivity and efficiency. However, there are also quality concerns regarding the AI-generated code. The generated code is produced by models trained on publicly available code, which are known to contain bugs and quality issues. Those issues can cause trust and maintenance challenges during the development process. Several quality issues associated with AI-generated code have been reported, including bugs and defects. However, these findings are often scattered and lack a systematic summary. A comprehensive review is currently lacking to reveal the types and distribution of these errors, possible remediation strategies, as well as their correlation with the specific models. In this paper, we systematically analyze the existing AI-generated code literature to establish an overall understanding of bugs and defects in generated code, providing a reference for future model improvement and quality assessment. We aim to understand the nature and extent of bugs in AI-generated code, and provide a classification of bug types and patterns present in code generated by different models. We also discuss possible fixes and mitigation strategies adopted to eliminate bugs from the generated code.

Paper Structure

This paper contains 40 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: The systematic literature review workflow
  • Figure 2: Distribution of the selected studies across venues and programming languages
  • Figure 3: The frequency of datasets and bug detection methods in studies
  • Figure 4: Taxonomy of bugs in AI-generated code
  • Figure 5: Overall bug type frequency and their relation to hallucination
  • ...and 2 more figures