Table of Contents
Fetching ...

Patch-to-PoC: A Systematic Study of Agentic LLM Systems for Linux Kernel N-Day Reproduction

Juefei Pu, Xingyu Li, Haonan Li, Zhengchuan Liang, Jonathan Cox, Yifan Wu, Kareem Shehada, Arrdya Srivastav, Zhiyun Qian

TL;DR

This work presents K-Repro, the first large-scale evaluation of an LLM-driven agent for Linux kernel N-day vulnerability reproduction. By integrating code browsing, VM control, and debugging within a standardized agent framework, K-Repro achieves over 50% PoC reproduction on 100 KernelCTF vulnerabilities, at tens of minutes per case and modest cost, outperforming the state-of-the-art fuzzing approach in end-to-end efficiency. The study systematically analyzes how tool availability, input signals, and vulnerability properties affect performance, revealing that race-condition and temporal memory bugs remain particularly challenging while informative commit messages significantly improve success. The results support the practical viability of autonomous security agents for kernel vulnerability analysis and provide design guidance for robust, scalable PoC generation and risk assessment in both offensive and defensive contexts.

Abstract

Autonomous large language model (LLM) based systems have recently shown promising results across a range of cybersecurity tasks. However, there is no systematic study on their effectiveness in autonomously reproducing Linux kernel vulnerabilities with concrete proofs-of-concept (PoCs). Owing to the size, complexity, and low-level nature of the Linux kernel, such tasks are widely regarded as particularly challenging for current LLM-based approaches. In this paper, we present the first large-scale study of LLM-based Linux kernel vulnerability reproduction. For this purpose, we develop K-Repro, an LLM-based agentic system equipped with controlled code-browsing, virtual machine management, interaction, and debugging capabilities. Using kernel security patches as input, K-Repro automates end-to-end bug reproduction of N-day vulnerabilities in the Linux kernel. On a dataset of 100 real-world exploitable Linux kernel vulnerabilities collected from KernelCTF, our results show that K-Repro can generate PoCs that reproduce over 50\% of the cases with practical time and monetary cost. Beyond aggregate success rates, we perform an extensive study of effectiveness, efficiency, stability, and impact factors to explain when agentic reproduction succeeds, where it fails, and which components drive performance. These findings provide actionable guidance for building more reliable autonomous security agents and for assessing real-world N-day risk from both offensive and defensive perspectives.

Patch-to-PoC: A Systematic Study of Agentic LLM Systems for Linux Kernel N-Day Reproduction

TL;DR

This work presents K-Repro, the first large-scale evaluation of an LLM-driven agent for Linux kernel N-day vulnerability reproduction. By integrating code browsing, VM control, and debugging within a standardized agent framework, K-Repro achieves over 50% PoC reproduction on 100 KernelCTF vulnerabilities, at tens of minutes per case and modest cost, outperforming the state-of-the-art fuzzing approach in end-to-end efficiency. The study systematically analyzes how tool availability, input signals, and vulnerability properties affect performance, revealing that race-condition and temporal memory bugs remain particularly challenging while informative commit messages significantly improve success. The results support the practical viability of autonomous security agents for kernel vulnerability analysis and provide design guidance for robust, scalable PoC generation and risk assessment in both offensive and defensive contexts.

Abstract

Autonomous large language model (LLM) based systems have recently shown promising results across a range of cybersecurity tasks. However, there is no systematic study on their effectiveness in autonomously reproducing Linux kernel vulnerabilities with concrete proofs-of-concept (PoCs). Owing to the size, complexity, and low-level nature of the Linux kernel, such tasks are widely regarded as particularly challenging for current LLM-based approaches. In this paper, we present the first large-scale study of LLM-based Linux kernel vulnerability reproduction. For this purpose, we develop K-Repro, an LLM-based agentic system equipped with controlled code-browsing, virtual machine management, interaction, and debugging capabilities. Using kernel security patches as input, K-Repro automates end-to-end bug reproduction of N-day vulnerabilities in the Linux kernel. On a dataset of 100 real-world exploitable Linux kernel vulnerabilities collected from KernelCTF, our results show that K-Repro can generate PoCs that reproduce over 50\% of the cases with practical time and monetary cost. Beyond aggregate success rates, we perform an extensive study of effectiveness, efficiency, stability, and impact factors to explain when agentic reproduction succeeds, where it fails, and which components drive performance. These findings provide actionable guidance for building more reliable autonomous security agents and for assessing real-world N-day risk from both offensive and defensive perspectives.
Paper Structure (46 sections, 2 figures, 10 tables)

This paper contains 46 sections, 2 figures, 10 tables.

Figures (2)

  • Figure 1: Overall Architecture
  • Figure 2: Time distribution across case percentiles.