Table of Contents
Fetching ...

AI Agentic Vulnerability Injection And Transformation with Optimized Reasoning

Amine Lbath, Massih-Reza Amini, Aurelien Delaitre, Vadim Okun

TL;DR

AVIATOR introduces an AI-agentic vulnerability injection framework that orchestrates specialized agents and tools to generate realistic, category-specific vulnerabilities for large-scale datasets. By combining Retrieval-Augmented Generation and LoRA-based fine-tuning within a modular pipeline, AVIATOR achieves high injection success across diverse benchmarks and expands CWE coverage beyond prior approaches. Empirical results show strong gains from supervised fine-tuning over reinforcement learning and demonstrate that workflow depth and specialized models substantially improve performance. The framework also integrates static analysis for validation, enabling robust, scalable dataset generation with practical cost efficiency. This work advances automated vulnerability data generation, addressing data scarcity and enabling more reliable training and benchmarking of AI-based vulnerability detection and repair systems.

Abstract

The increasing complexity of software systems and the sophistication of cyber-attacks have underscored the critical need for effective automated vulnerability detection and repair systems. Data-driven approaches using deep learning models show promise but critically depend on the availability of large, accurately labeled datasets. Yet existing datasets either suffer from noisy labels, limited range of vulnerabilities, or fail to reflect vulnerabilities as they occur in real-world software. This also limits large-scale benchmarking of such solutions. Automated vulnerability injection provides a way to directly address these dataset limitations, but existing techniques remain limited in coverage, contextual fidelity, or injection success rates. In this paper, we present AVIATOR, the first AI-agentic vulnerability injection workflow. It automatically injects realistic, category-specific vulnerabilities for high-fidelity, diverse, large-scale vulnerability dataset generation. Unlike prior monolithic approaches, AVIATOR orchestrates specialized AI agents, function agents and traditional code analysis tools that replicate expert reasoning. It combines semantic analysis, injection synthesis enhanced with LoRA-based fine-tuning and Retrieval-Augmented Generation, as well as post-injection validation via static analysis and LLM-based discriminators. This modular decomposition allows specialized agents to focus on distinct tasks, improving robustness of injection and reducing error propagation across the workflow. Evaluations across three distinct benchmarks demonstrate that AVIATOR achieves 91%-95% injection success rates, significantly surpassing existing automated dataset generation techniques in both accuracy and scope of software vulnerabilities.

AI Agentic Vulnerability Injection And Transformation with Optimized Reasoning

TL;DR

AVIATOR introduces an AI-agentic vulnerability injection framework that orchestrates specialized agents and tools to generate realistic, category-specific vulnerabilities for large-scale datasets. By combining Retrieval-Augmented Generation and LoRA-based fine-tuning within a modular pipeline, AVIATOR achieves high injection success across diverse benchmarks and expands CWE coverage beyond prior approaches. Empirical results show strong gains from supervised fine-tuning over reinforcement learning and demonstrate that workflow depth and specialized models substantially improve performance. The framework also integrates static analysis for validation, enabling robust, scalable dataset generation with practical cost efficiency. This work advances automated vulnerability data generation, addressing data scarcity and enabling more reliable training and benchmarking of AI-based vulnerability detection and repair systems.

Abstract

The increasing complexity of software systems and the sophistication of cyber-attacks have underscored the critical need for effective automated vulnerability detection and repair systems. Data-driven approaches using deep learning models show promise but critically depend on the availability of large, accurately labeled datasets. Yet existing datasets either suffer from noisy labels, limited range of vulnerabilities, or fail to reflect vulnerabilities as they occur in real-world software. This also limits large-scale benchmarking of such solutions. Automated vulnerability injection provides a way to directly address these dataset limitations, but existing techniques remain limited in coverage, contextual fidelity, or injection success rates. In this paper, we present AVIATOR, the first AI-agentic vulnerability injection workflow. It automatically injects realistic, category-specific vulnerabilities for high-fidelity, diverse, large-scale vulnerability dataset generation. Unlike prior monolithic approaches, AVIATOR orchestrates specialized AI agents, function agents and traditional code analysis tools that replicate expert reasoning. It combines semantic analysis, injection synthesis enhanced with LoRA-based fine-tuning and Retrieval-Augmented Generation, as well as post-injection validation via static analysis and LLM-based discriminators. This modular decomposition allows specialized agents to focus on distinct tasks, improving robustness of injection and reducing error propagation across the workflow. Evaluations across three distinct benchmarks demonstrate that AVIATOR achieves 91%-95% injection success rates, significantly surpassing existing automated dataset generation techniques in both accuracy and scope of software vulnerabilities.

Paper Structure

This paper contains 58 sections, 9 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Architecture of the AVIATOR framework illustrating the two core modules. The top module represents the Vulnerability Injection component, which utilizes RAG and supervised fine-tuning with LoRA for guiding code transformations. The bottom module depicts the Validation component, employing LLM-based discriminators and static analysis to verify the presence and accuracy of injected vulnerabilities.
  • Figure 2: AVIATOR average injection success rate on 5 attempts. Empirical study for LLM agents using Llama-4-Maverick, Qwen2.5-Coder without fine-tuning and Qwen2.5-Coder with the vulnerability injection agent fine-tuned using SFT and GRPO.