Table of Contents
Fetching ...

From Vulnerabilities to Remediation: A Systematic Literature Review of LLMs in Code Security

Enna Basic, Alberto Giaretta

TL;DR

This systematic literature review investigates the security implications of large language models (LLMs) in code tasks, focusing on vulnerabilities introduced by generated code, the effectiveness of LLMs in detecting and fixing vulnerabilities, and the impact of poisoned training data. It follows a rigorous SLR methodology to synthesize findings across 47 focused sections, identifying ten vulnerability categories and highlighting prominent issues such as injection, memory management, and data exposure, mapped to CWE references. The review reveals that while LLMs can outperform traditional vulnerability detectors in some settings, they often suffer from high false positive rates and struggle with complex real-world code; prompting strategies, especially chain-of-thought, task-oriented, and role-based prompts, can materially improve performance, though gains are context-dependent. A critical and underexplored risk is data poisoning, which can cause LLMs to generate insecure code and may impair their ability to detect or fix vulnerabilities, underscoring the need for robust defenses and safer training practices. Overall, the paper provides a multi-faceted perspective on when and how LLMs are advantageous for code security, and it outlines key open challenges and directions for future work in secure prompting, model fine-tuning, and poisoning mitigation.

Abstract

Large Language Models (LLMs) have emerged as powerful tools for automating various programming tasks, including security-related ones, such as detecting and fixing vulnerabilities. Despite their promising capabilities, when required to produce or modify pre-existing code, LLMs could introduce vulnerabilities unbeknown to the programmer. When analyzing code, they could miss clear vulnerabilities or signal nonexistent ones. In this Systematic Literature Review (SLR), we aim to investigate both the security benefits and potential drawbacks of using LLMs for a variety of code-related tasks. In particular, first we focus on the types of vulnerabilities that could be introduced by LLMs, when used for producing code. Second, we analyze the capabilities of LLMs to detect and fix vulnerabilities, in any given code, and how the prompting strategy of choice impacts their performance in these two tasks. Last, we provide an in-depth analysis on how data poisoning attacks on LLMs can impact performance in the aforementioned tasks.

From Vulnerabilities to Remediation: A Systematic Literature Review of LLMs in Code Security

TL;DR

This systematic literature review investigates the security implications of large language models (LLMs) in code tasks, focusing on vulnerabilities introduced by generated code, the effectiveness of LLMs in detecting and fixing vulnerabilities, and the impact of poisoned training data. It follows a rigorous SLR methodology to synthesize findings across 47 focused sections, identifying ten vulnerability categories and highlighting prominent issues such as injection, memory management, and data exposure, mapped to CWE references. The review reveals that while LLMs can outperform traditional vulnerability detectors in some settings, they often suffer from high false positive rates and struggle with complex real-world code; prompting strategies, especially chain-of-thought, task-oriented, and role-based prompts, can materially improve performance, though gains are context-dependent. A critical and underexplored risk is data poisoning, which can cause LLMs to generate insecure code and may impair their ability to detect or fix vulnerabilities, underscoring the need for robust defenses and safer training practices. Overall, the paper provides a multi-faceted perspective on when and how LLMs are advantageous for code security, and it outlines key open challenges and directions for future work in secure prompting, model fine-tuning, and poisoning mitigation.

Abstract

Large Language Models (LLMs) have emerged as powerful tools for automating various programming tasks, including security-related ones, such as detecting and fixing vulnerabilities. Despite their promising capabilities, when required to produce or modify pre-existing code, LLMs could introduce vulnerabilities unbeknown to the programmer. When analyzing code, they could miss clear vulnerabilities or signal nonexistent ones. In this Systematic Literature Review (SLR), we aim to investigate both the security benefits and potential drawbacks of using LLMs for a variety of code-related tasks. In particular, first we focus on the types of vulnerabilities that could be introduced by LLMs, when used for producing code. Second, we analyze the capabilities of LLMs to detect and fix vulnerabilities, in any given code, and how the prompting strategy of choice impacts their performance in these two tasks. Last, we provide an in-depth analysis on how data poisoning attacks on LLMs can impact performance in the aforementioned tasks.

Paper Structure

This paper contains 47 sections, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Systematic literature review methodology.
  • Figure 2: Bar chart showing the number of studies discussing each category of security vulnerabilities introduced by LLMs.
  • Figure 3: Fine-tuning LLM with the security-specific dataset.
  • Figure 4: Example of a prompt, using the zero-shot prompting technique. In green, the main body of the instructions. In blue, the code to be analyzed by the LLM.
  • Figure 5: Example of a prompt, using the few-shot prompting technique. In the gray boxes, the code examples and corresponding vulnerabilities provided to the LLM. In green, the main body of the instructions. In blue, the code to be analyzed.
  • ...and 5 more figures