Table of Contents
Fetching ...

Can LLMs Patch Security Issues?

Kamel Alrashedy, Abdullah Aljasser, Pradyumna Tambwekar, Matthew Gombolay

TL;DR

This work proposes Feedback-Driven Security Patching (FDSP), where LLMs automatically refine generated, vulnerable code, and leverages automatic static code analysis to empower the LLM to generate and implement potential solutions to address vulnerabilities.

Abstract

Large Language Models (LLMs) have shown impressive proficiency in code generation. Unfortunately, these models share a weakness with their human counterparts: producing code that inadvertently has security vulnerabilities. These vulnerabilities could allow unauthorized attackers to access sensitive data or systems, which is unacceptable for safety-critical applications. In this work, we propose Feedback-Driven Security Patching (FDSP), where LLMs automatically refine generated, vulnerable code. Our approach leverages automatic static code analysis to empower the LLM to generate and implement potential solutions to address vulnerabilities. We address the research communitys needs for safe code generation by introducing a large-scale dataset, PythonSecurityEval, covering the diversity of real-world applications, including databases, websites and operating systems. We empirically validate that FDSP outperforms prior work that uses self-feedback from LLMs by up to 17.6% through our procedure that injects targeted, external feedback. Code and data are available at \url{https://github.com/Kamel773/LLM-code-refine}

Can LLMs Patch Security Issues?

TL;DR

This work proposes Feedback-Driven Security Patching (FDSP), where LLMs automatically refine generated, vulnerable code, and leverages automatic static code analysis to empower the LLM to generate and implement potential solutions to address vulnerabilities.

Abstract

Large Language Models (LLMs) have shown impressive proficiency in code generation. Unfortunately, these models share a weakness with their human counterparts: producing code that inadvertently has security vulnerabilities. These vulnerabilities could allow unauthorized attackers to access sensitive data or systems, which is unacceptable for safety-critical applications. In this work, we propose Feedback-Driven Security Patching (FDSP), where LLMs automatically refine generated, vulnerable code. Our approach leverages automatic static code analysis to empower the LLM to generate and implement potential solutions to address vulnerabilities. We address the research communitys needs for safe code generation by introducing a large-scale dataset, PythonSecurityEval, covering the diversity of real-world applications, including databases, websites and operating systems. We empirically validate that FDSP outperforms prior work that uses self-feedback from LLMs by up to 17.6% through our procedure that injects targeted, external feedback. Code and data are available at \url{https://github.com/Kamel773/LLM-code-refine}
Paper Structure (17 sections, 5 equations, 7 figures, 6 tables, 1 algorithm)

This paper contains 17 sections, 5 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of our approach: Initially, the LLMs generate code. This code is subsequently analyzed for security vulnerabilities using Bandit, a tool for static code analysis, to identify potential security issues (see Figure \ref{['fig:Feedback_Bandit']}). The identified potential security threats are then incorporated into the LLMs to generate possible solutions for resolving these security issues. Finally, each proposed solution is sent back to the LLMs for code refinement.
  • Figure 2: An example of the report generated by Bandit, a static code analysis tool, for the vulnerable code in Code Snippet \ref{['example']}.
  • Figure 3: The total count of the five most frequent security issues across five refinement approaches for CodeLlama in the PythonSecurityEval dataset.
  • Figure 4: Comparison of the total number of unresolved vulnerable code instances identified by three LLMs on the PythonSecurityEval dataset.
  • Figure 5: The total count of the most common security issues in the code generated for the PythonSecurityEval dataset (Top 10).
  • ...and 2 more figures