Table of Contents
Fetching ...

Where Is Self-admitted Code Generated by Large Language Models on GitHub?

Xiao Yu, Lei Liu, Xing Hu, Jin Liu, Xin Xia

TL;DR

This study empirically examines self-admitted GPT-generated code on GitHub, revealing that ChatGPT and Copilot dominate real-world code generation in small to medium projects. Generated snippets are typically short, low in complexity, and contribute only a small fraction of total project LOC, with minimal post-hoc modifications and limited bug incidence. A robust annotation and analysis pipeline, including manual taxonomy and SonarQube metrics, uncovers characteristic code types, modification patterns, and the sparse, information-rich comments surrounding GPT-generated code. The findings inform practitioners about practical usage patterns, guide researchers toward targeted evaluation and detection benchmarks, and suggest best practices for documenting generated code within software projects.

Abstract

The increasing use of Large Language Models (LLMs) in software development has garnered significant attention from researchers evaluating the capabilities and limitations of LLMs for code generation. However, much of the research focuses on controlled datasets such as HumanEval, which do not adequately capture the characteristics of LLM-generated code in real-world development scenarios. To address this gap, our study investigates self-admitted code generated by LLMs on GitHub, specifically focusing on instances where developers in projects with over five stars acknowledge the use of LLMs to generate code through code comments. Our findings reveal several key insights: (1) ChatGPT and Copilot dominate code generation, with minimal contributions from other LLMs. (2) Projects containing ChatGPT/Copilot-generated code appears in small/medium-sized projects led by small teams, which are continuously evolving. (3) ChatGPT/Copilot-generated code generally is a minor project portion, primarily generating short/moderate-length, low-complexity snippets (e.g., algorithms and data structures code; text processing code). (4) ChatGPT/Copilot-generated code generally undergoes minimal modifications, with bug-related changes ranging from 4% to 12%. (5) Most code comments only state LLM use, while few include details like prompts, human edits, or code testing status. Based on these findings, we discuss the implications for researchers and practitioners.

Where Is Self-admitted Code Generated by Large Language Models on GitHub?

TL;DR

This study empirically examines self-admitted GPT-generated code on GitHub, revealing that ChatGPT and Copilot dominate real-world code generation in small to medium projects. Generated snippets are typically short, low in complexity, and contribute only a small fraction of total project LOC, with minimal post-hoc modifications and limited bug incidence. A robust annotation and analysis pipeline, including manual taxonomy and SonarQube metrics, uncovers characteristic code types, modification patterns, and the sparse, information-rich comments surrounding GPT-generated code. The findings inform practitioners about practical usage patterns, guide researchers toward targeted evaluation and detection benchmarks, and suggest best practices for documenting generated code within software projects.

Abstract

The increasing use of Large Language Models (LLMs) in software development has garnered significant attention from researchers evaluating the capabilities and limitations of LLMs for code generation. However, much of the research focuses on controlled datasets such as HumanEval, which do not adequately capture the characteristics of LLM-generated code in real-world development scenarios. To address this gap, our study investigates self-admitted code generated by LLMs on GitHub, specifically focusing on instances where developers in projects with over five stars acknowledge the use of LLMs to generate code through code comments. Our findings reveal several key insights: (1) ChatGPT and Copilot dominate code generation, with minimal contributions from other LLMs. (2) Projects containing ChatGPT/Copilot-generated code appears in small/medium-sized projects led by small teams, which are continuously evolving. (3) ChatGPT/Copilot-generated code generally is a minor project portion, primarily generating short/moderate-length, low-complexity snippets (e.g., algorithms and data structures code; text processing code). (4) ChatGPT/Copilot-generated code generally undergoes minimal modifications, with bug-related changes ranging from 4% to 12%. (5) Most code comments only state LLM use, while few include details like prompts, human edits, or code testing status. Based on these findings, we discuss the implications for researchers and practitioners.
Paper Structure (15 sections, 4 equations, 4 figures, 8 tables)

This paper contains 15 sections, 4 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: The distribution of the stars and files in the projects containing the GPT-generated code.
  • Figure 2: The violin plot distribution of the LOC, cyclomatic complexity, and cognitive complexity of the GPT-generated code.
  • Figure 3: A bug-fix modification made to the GPT-generated code, which contains an algorithmic logic error.
  • Figure 4: A feature addition made to the GPT-generated code.