Table of Contents
Fetching ...

AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing

Ana Nunez, Nafis Tanveer Islam, Sumit Kumar Jha, Peyman Najafirad

TL;DR

This work tackles the security shortcomings of LLM-driven code generation by introducing AutoSafeCoder, a three-agent framework (Coding Agent, Static Analyzer Agent, Fuzzing Agent) that collaboratively generates secure, functional Python code through an iterative static/dynamic analysis loop. It leverages GPT-4o-based agents and evaluates on SecurityEval and HumanEval, reporting a 13% reduction in vulnerabilities with minimal functional loss. Key contributions include a concrete multi-agent collaboration protocol, CWE-based vulnerability detection, and a type-aware mutation fuzzing strategy that guides code refinement. The results demonstrate that security can be enhanced without substantial sacrifice to functionality, underscoring the practical potential of secure, AI-assisted code generation for Python ecosystems.

Abstract

Recent advancements in automatic code generation using large language models (LLMs) have brought us closer to fully automated secure software development. However, existing approaches often rely on a single agent for code generation, which struggles to produce secure, vulnerability-free code. Traditional program synthesis with LLMs has primarily focused on functional correctness, often neglecting critical dynamic security implications that happen during runtime. To address these challenges, we propose AutoSafeCoder, a multi-agent framework that leverages LLM-driven agents for code generation, vulnerability analysis, and security enhancement through continuous collaboration. The framework consists of three agents: a Coding Agent responsible for code generation, a Static Analyzer Agent identifying vulnerabilities, and a Fuzzing Agent performing dynamic testing using a mutation-based fuzzing approach to detect runtime errors. Our contribution focuses on ensuring the safety of multi-agent code generation by integrating dynamic and static testing in an iterative process during code generation by LLM that improves security. Experiments using the SecurityEval dataset demonstrate a 13% reduction in code vulnerabilities compared to baseline LLMs, with no compromise in functionality.

AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing

TL;DR

This work tackles the security shortcomings of LLM-driven code generation by introducing AutoSafeCoder, a three-agent framework (Coding Agent, Static Analyzer Agent, Fuzzing Agent) that collaboratively generates secure, functional Python code through an iterative static/dynamic analysis loop. It leverages GPT-4o-based agents and evaluates on SecurityEval and HumanEval, reporting a 13% reduction in vulnerabilities with minimal functional loss. Key contributions include a concrete multi-agent collaboration protocol, CWE-based vulnerability detection, and a type-aware mutation fuzzing strategy that guides code refinement. The results demonstrate that security can be enhanced without substantial sacrifice to functionality, underscoring the practical potential of secure, AI-assisted code generation for Python ecosystems.

Abstract

Recent advancements in automatic code generation using large language models (LLMs) have brought us closer to fully automated secure software development. However, existing approaches often rely on a single agent for code generation, which struggles to produce secure, vulnerability-free code. Traditional program synthesis with LLMs has primarily focused on functional correctness, often neglecting critical dynamic security implications that happen during runtime. To address these challenges, we propose AutoSafeCoder, a multi-agent framework that leverages LLM-driven agents for code generation, vulnerability analysis, and security enhancement through continuous collaboration. The framework consists of three agents: a Coding Agent responsible for code generation, a Static Analyzer Agent identifying vulnerabilities, and a Fuzzing Agent performing dynamic testing using a mutation-based fuzzing approach to detect runtime errors. Our contribution focuses on ensuring the safety of multi-agent code generation by integrating dynamic and static testing in an iterative process during code generation by LLM that improves security. Experiments using the SecurityEval dataset demonstrate a 13% reduction in code vulnerabilities compared to baseline LLMs, with no compromise in functionality.
Paper Structure (15 sections, 6 figures, 2 tables)

This paper contains 15 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Overview of our multi-agent framework integrating three agents: i) Coding Agent, ii) Static Analyzer Agent, and iii) Fuzzing Agent. The process begins with the Coding Agent generating code from code requirements. The Static Analyzer Agent performs code auditing. The Fuzzing Agent then mutates inputs to identify potential crashes. Any errors are fed back to the Coding Agent for further revisions.
  • Figure 2: Prompt template used by coding agent for code generation.
  • Figure 3: Prompt template used by static analyzer agent to detect vulnerabilities.
  • Figure 4: Prompt template used to generate initial inputs for fuzzing
  • Figure 5: Prompt template used by coding agent to receive feedback from static analyzer.
  • ...and 1 more figures