Table of Contents
Fetching ...

Security Vulnerabilities in AI-Generated Code: A Large-Scale Analysis of Public GitHub Repositories

Maximilian Schreiber, Pascal Tippe

TL;DR

This study analyzes security vulnerabilities in AI-generated code across public GitHub repositories by collecting 7,703 files attributed to four tools and applying CodeQL to map findings to CWE and CVE data. The large-scale analysis reveals that 87.9% of AI-generated code lacks CWE-mapped vulnerabilities, but language-specific patterns (notably Python) and tool-specific security densities (GitHub Copilot for Python/TypeScript; ChatGPT for JavaScript) emerge. It also finds widespread use of AI for documentation generation (39%), highlighting maintainability considerations in addition to security. The work provides language- and tool-aware recommendations for secure AI-assisted development and sets a foundation for future longitudinal and controlled studies to further strengthen safe integration of AI-generated code in software workflows.

Abstract

This paper presents a comprehensive empirical analysis of security vulnerabilities in AI-generated code across public GitHub repositories. We collected and analyzed 7,703 files explicitly attributed to four major AI tools: ChatGPT (91.52\%), GitHub Copilot (7.50\%), Amazon CodeWhisperer (0.52\%), and Tabnine (0.46\%). Using CodeQL static analysis, we identified 4,241 Common Weakness Enumeration (CWE) instances across 77 distinct vulnerability types. Our findings reveal that while 87.9\% of AI-generated code does not contain identifiable CWE-mapped vulnerabilities, significant patterns emerge regarding language-specific vulnerabilities and tool performance. Python consistently exhibited higher vulnerability rates (16.18\%-18.50\%) compared to JavaScript (8.66\%-8.99\%) and TypeScript (2.50\%-7.14\%) across all tools. We observed notable differences in security performance, with GitHub Copilot achieving better security density for Python (1,739 LOC per CWE) and TypeScript, while ChatGPT performed better for JavaScript. Additionally, we discovered widespread use of AI tools for documentation generation (39\% of collected files), an understudied application with implications for software maintainability. These findings extend previous work with a significantly larger dataset and provide valuable insights for developing language-specific and context-aware security practices for the responsible integration of AI-generated code into software development workflows.

Security Vulnerabilities in AI-Generated Code: A Large-Scale Analysis of Public GitHub Repositories

TL;DR

This study analyzes security vulnerabilities in AI-generated code across public GitHub repositories by collecting 7,703 files attributed to four tools and applying CodeQL to map findings to CWE and CVE data. The large-scale analysis reveals that 87.9% of AI-generated code lacks CWE-mapped vulnerabilities, but language-specific patterns (notably Python) and tool-specific security densities (GitHub Copilot for Python/TypeScript; ChatGPT for JavaScript) emerge. It also finds widespread use of AI for documentation generation (39%), highlighting maintainability considerations in addition to security. The work provides language- and tool-aware recommendations for secure AI-assisted development and sets a foundation for future longitudinal and controlled studies to further strengthen safe integration of AI-generated code in software workflows.

Abstract

This paper presents a comprehensive empirical analysis of security vulnerabilities in AI-generated code across public GitHub repositories. We collected and analyzed 7,703 files explicitly attributed to four major AI tools: ChatGPT (91.52\%), GitHub Copilot (7.50\%), Amazon CodeWhisperer (0.52\%), and Tabnine (0.46\%). Using CodeQL static analysis, we identified 4,241 Common Weakness Enumeration (CWE) instances across 77 distinct vulnerability types. Our findings reveal that while 87.9\% of AI-generated code does not contain identifiable CWE-mapped vulnerabilities, significant patterns emerge regarding language-specific vulnerabilities and tool performance. Python consistently exhibited higher vulnerability rates (16.18\%-18.50\%) compared to JavaScript (8.66\%-8.99\%) and TypeScript (2.50\%-7.14\%) across all tools. We observed notable differences in security performance, with GitHub Copilot achieving better security density for Python (1,739 LOC per CWE) and TypeScript, while ChatGPT performed better for JavaScript. Additionally, we discovered widespread use of AI tools for documentation generation (39\% of collected files), an understudied application with implications for software maintainability. These findings extend previous work with a significantly larger dataset and provide valuable insights for developing language-specific and context-aware security practices for the responsible integration of AI-generated code into software development workflows.

Paper Structure

This paper contains 23 sections, 8 tables.