Table of Contents
Fetching ...

A Dual-Loop Agent Framework for Automated Vulnerability Reproduction

Bin Liu, Yanjie Zhao, Zhenpeng Chen, Guoai Xu, Haoyu Wang

TL;DR

Cve2PoC tackles the challenge of reproducing vulnerabilities from CVE descriptions by introducing a dual-loop, LLM-based framework that distinctly separates strategic vulnerability analysis from tactical PoC generation. The Strategic Planner builds structured attack plans, the Tactical Executor synthesizes and verifies PoCs through progressive checks, and the Adaptive Refiner routes failures between refinement and replanning using dual-dimension feedback with sparse experience indexing. On SecBench.js and PatchEval, Cve2PoC achieves 82.9% and 54.3% reproduction success rates, respectively, outperforming baselines by up to 20.4%, while requiring far fewer tokens. Human evaluators find the generated PoCs comparable in readability and reusability to human-written exploits, highlighting practical relevance for automated vulnerability reproduction and security testing. The work integrates structured pre-analysis, multi-layer verification, and memory-efficient learning to deliver robust, cross-language PoC generation with tangible impact for security research and practice.

Abstract

Automated vulnerability reproduction from CVE descriptions requires generating executable Proof-of-Concept (PoC) exploits and validating them in target environments. This process is critical in software security research and practice, yet remains time-consuming and demands specialized expertise when performed manually. While LLM agents show promise for automating this task, existing approaches often conflate exploring attack directions with fixing implementation details, which leads to unproductive debugging loops when reproduction fails. To address this, we propose Cve2PoC, an LLM-based dual-loop agent framework following a plan-execute-evaluate paradigm. The Strategic Planner analyzes vulnerability semantics and target code to produce structured attack plans. The Tactical Executor generates PoC code and validates it through progressive verification. The Adaptive Refiner evaluates execution results and routes failures to different loops: the \textit{Tactical Loop} for code-level refinement, while the \textit{Strategic Loop} for attack strategy replanning. This dual-loop design enables the framework to escape ineffective debugging by matching remediation to failure type. Evaluation on two benchmarks covering 617 real-world vulnerabilities demonstrates that Cve2PoC achieves 82.9\% and 54.3\% reproduction success rates on SecBench.js and PatchEval, respectively, outperforming the best baseline by 11.3\% and 20.4\%. Human evaluation confirms that generated PoCs achieve comparable code quality to human-written exploits in readability and reusability.

A Dual-Loop Agent Framework for Automated Vulnerability Reproduction

TL;DR

Cve2PoC tackles the challenge of reproducing vulnerabilities from CVE descriptions by introducing a dual-loop, LLM-based framework that distinctly separates strategic vulnerability analysis from tactical PoC generation. The Strategic Planner builds structured attack plans, the Tactical Executor synthesizes and verifies PoCs through progressive checks, and the Adaptive Refiner routes failures between refinement and replanning using dual-dimension feedback with sparse experience indexing. On SecBench.js and PatchEval, Cve2PoC achieves 82.9% and 54.3% reproduction success rates, respectively, outperforming baselines by up to 20.4%, while requiring far fewer tokens. Human evaluators find the generated PoCs comparable in readability and reusability to human-written exploits, highlighting practical relevance for automated vulnerability reproduction and security testing. The work integrates structured pre-analysis, multi-layer verification, and memory-efficient learning to deliver robust, cross-language PoC generation with tangible impact for security research and practice.

Abstract

Automated vulnerability reproduction from CVE descriptions requires generating executable Proof-of-Concept (PoC) exploits and validating them in target environments. This process is critical in software security research and practice, yet remains time-consuming and demands specialized expertise when performed manually. While LLM agents show promise for automating this task, existing approaches often conflate exploring attack directions with fixing implementation details, which leads to unproductive debugging loops when reproduction fails. To address this, we propose Cve2PoC, an LLM-based dual-loop agent framework following a plan-execute-evaluate paradigm. The Strategic Planner analyzes vulnerability semantics and target code to produce structured attack plans. The Tactical Executor generates PoC code and validates it through progressive verification. The Adaptive Refiner evaluates execution results and routes failures to different loops: the \textit{Tactical Loop} for code-level refinement, while the \textit{Strategic Loop} for attack strategy replanning. This dual-loop design enables the framework to escape ineffective debugging by matching remediation to failure type. Evaluation on two benchmarks covering 617 real-world vulnerabilities demonstrates that Cve2PoC achieves 82.9\% and 54.3\% reproduction success rates on SecBench.js and PatchEval, respectively, outperforming the best baseline by 11.3\% and 20.4\%. Human evaluation confirms that generated PoCs achieve comparable code quality to human-written exploits in readability and reusability.
Paper Structure (43 sections, 3 figures, 8 tables)

This paper contains 43 sections, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Overview of the Cve2PoC framework.
  • Figure 2: Structured outputs from Strategic Planner agents.
  • Figure 3: Impact analysis across different dimensions. (a) Performance varies by vulnerability type, with open-source models surpassing cloud on Code Injection. (b) Older CVEs are harder to reproduce. (c) Shorter descriptions favor cloud models. (d) Medium-sized codebases yield optimal results.