Table of Contents
Fetching ...

LLM-CSEC: Empirical Evaluation of Security in C/C++ Code Generated by Large Language Models

Muhammad Usman Shahid, Chuadhry Mujeeb Ahmed, Rajiv Ranjan

TL;DR

This study evaluates the security of code generated by ten Large Language Models for C/C++ by mapping weaknesses to CWE categories and CVEs, and by applying static analysis with CodeQL, Snyk Code, and CodeShield. It introduces a CWE-based prompt dataset, a dual-assistant code-generation workflow (CG and SCG), and a reproducible methodology for analyzing vulnerabilities across 20 codebases. The results reveal substantial CWE presence across models, with high-risk weaknesses like CWE-119, CWE-120, CWE-787, and others linked to real-world CVEs, underscoring the need for cautious deployment and thorough review of AI-generated code. The findings motivate improved prompt design, cross-tool validation, and extended evaluation across languages and newer models to advance secure automated code generation.

Abstract

The security of code generated by large language models (LLMs) is a significant concern, as studies indicate that such code often contains vulnerabilities and lacks essential defensive programming constructs. This work focuses on examining and evaluating the security of LLM-generated code, particularly in the context of C/C++. We categorized known vulnerabilities using the Common Weakness Enumeration (CWE) and, to study their criticality, mapped them to CVEs. We used ten different LLMs for code generation and analyzed the outputs through static analysis. The amount of CWEs present in AI-generated code is concerning. Our findings highlight the need for developers to be cautious when using LLM-generated code. This study provides valuable insights to advance automated code generation and encourage further research in this domain.

LLM-CSEC: Empirical Evaluation of Security in C/C++ Code Generated by Large Language Models

TL;DR

This study evaluates the security of code generated by ten Large Language Models for C/C++ by mapping weaknesses to CWE categories and CVEs, and by applying static analysis with CodeQL, Snyk Code, and CodeShield. It introduces a CWE-based prompt dataset, a dual-assistant code-generation workflow (CG and SCG), and a reproducible methodology for analyzing vulnerabilities across 20 codebases. The results reveal substantial CWE presence across models, with high-risk weaknesses like CWE-119, CWE-120, CWE-787, and others linked to real-world CVEs, underscoring the need for cautious deployment and thorough review of AI-generated code. The findings motivate improved prompt design, cross-tool validation, and extended evaluation across languages and newer models to advance secure automated code generation.

Abstract

The security of code generated by large language models (LLMs) is a significant concern, as studies indicate that such code often contains vulnerabilities and lacks essential defensive programming constructs. This work focuses on examining and evaluating the security of LLM-generated code, particularly in the context of C/C++. We categorized known vulnerabilities using the Common Weakness Enumeration (CWE) and, to study their criticality, mapped them to CVEs. We used ten different LLMs for code generation and analyzed the outputs through static analysis. The amount of CWEs present in AI-generated code is concerning. Our findings highlight the need for developers to be cautious when using LLM-generated code. This study provides valuable insights to advance automated code generation and encourage further research in this domain.

Paper Structure

This paper contains 22 sections, 2 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Workflow Overview.
  • Figure 2: Workflow for prompt generation
  • Figure 3: Demonstrative example for CWE-14 from MITRE.
  • Figure 4: Unique CWEs found by CodeQL, Snyk Code, and CodeShield for CG and SCG across ten different models.
  • Figure 5: CWE detected by CodeQL against CG and SCG codebases of each model
  • ...and 3 more figures

Theorems & Definitions (3)

  • Definition 1.1: Vulnerability
  • Definition 1.2: Common Weakness Enumeration (CWE)
  • Definition 1.3: Common Vulnerabilities and Exposure (CVE)