Table of Contents
Fetching ...

Vul-R2: A Reasoning LLM for Automated Vulnerability Repair

Xin-Cheng Wen, Zirui Lin, Yijun Yang, Cuiyun Gao, Deheng Ye

TL;DR

Vul-R2 introduces a reasoning-based LLM framework for automated vulnerability repair, addressing the lack of vulnerability-specific reasoning data and the absence of verifiable intermediate feedback during training. It combines Domain-Aware Reasoning Learning (DARL) with Curriculum-based Verifiable Rewarded Training (CVRT), featuring Reasoning Answer Construction (RAC), data filtering, and domain-aware SFT, followed by a two-stage RLVR process that progressively strengthens verifiable reasoning. Experimental results on PrimeVul and SVEN show Vul-R2 achieving state-of-the-art performance across EM and CodeBLEU, repairing significantly more vulnerabilities than CodePTMs and standard LLM baselines, and demonstrating robustness across vulnerability types. The findings highlight the importance of explicit vulnerability-domain knowledge and structured, verifiable reasoning in enabling effective AVR with real-world impact.

Abstract

The exponential increase in software vulnerabilities has created an urgent need for automatic vulnerability repair (AVR) solutions. Recent research has formulated AVR as a sequence generation problem and has leveraged large language models (LLMs) to address this problem. Typically, these approaches prompt or fine-tune LLMs to generate repairs for vulnerabilities directly. Although these methods show state-of-the-art performance, they face the following challenges: (1) Lack of high-quality, vulnerability-related reasoning data. Current approaches primarily rely on foundation models that mainly encode general programming knowledge. Without vulnerability-related reasoning data, they tend to fail to capture the diverse vulnerability repair patterns. (2) Hard to verify the intermediate vulnerability repair process during LLM training. Existing reinforcement learning methods often leverage intermediate execution feedback from the environment (e.g., sandbox-based execution results) to guide reinforcement learning training. In contrast, the vulnerability repair process generally lacks such intermediate, verifiable feedback, which poses additional challenges for model training.

Vul-R2: A Reasoning LLM for Automated Vulnerability Repair

TL;DR

Vul-R2 introduces a reasoning-based LLM framework for automated vulnerability repair, addressing the lack of vulnerability-specific reasoning data and the absence of verifiable intermediate feedback during training. It combines Domain-Aware Reasoning Learning (DARL) with Curriculum-based Verifiable Rewarded Training (CVRT), featuring Reasoning Answer Construction (RAC), data filtering, and domain-aware SFT, followed by a two-stage RLVR process that progressively strengthens verifiable reasoning. Experimental results on PrimeVul and SVEN show Vul-R2 achieving state-of-the-art performance across EM and CodeBLEU, repairing significantly more vulnerabilities than CodePTMs and standard LLM baselines, and demonstrating robustness across vulnerability types. The findings highlight the importance of explicit vulnerability-domain knowledge and structured, verifiable reasoning in enabling effective AVR with real-world impact.

Abstract

The exponential increase in software vulnerabilities has created an urgent need for automatic vulnerability repair (AVR) solutions. Recent research has formulated AVR as a sequence generation problem and has leveraged large language models (LLMs) to address this problem. Typically, these approaches prompt or fine-tune LLMs to generate repairs for vulnerabilities directly. Although these methods show state-of-the-art performance, they face the following challenges: (1) Lack of high-quality, vulnerability-related reasoning data. Current approaches primarily rely on foundation models that mainly encode general programming knowledge. Without vulnerability-related reasoning data, they tend to fail to capture the diverse vulnerability repair patterns. (2) Hard to verify the intermediate vulnerability repair process during LLM training. Existing reinforcement learning methods often leverage intermediate execution feedback from the environment (e.g., sandbox-based execution results) to guide reinforcement learning training. In contrast, the vulnerability repair process generally lacks such intermediate, verifiable feedback, which poses additional challenges for model training.

Paper Structure

This paper contains 48 sections, 5 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: Illustration of the vulnerability "Integer Overflow or Wraparound CWE190" repaired by an open-source reasoning LLM (i.e., QwQ-32B QwQ-32B) and our Vul-R2. More detailed reasoning traces and the case of Vul-R2 can be found in Fig. \ref{['fig:detailed_case']}. We adopt QwQ-32B in the case since the reasoning process of privileged LLMs, such as OpenAI-o3 gpto3, is inaccessible due to their usage policies.
  • Figure 2: The overview of Vul-R2.
  • Figure 3: The illustration of the prompt in the RAC. Contents in "{ }" will be substituted by the corresponding data.
  • Figure 4: The illustration of the prompt in the CVRT module.
  • Figure 5: Reward and mean response length during RLVR training ($\text{Vul-R2\xspace}^*$), illustrating how the model autonomously learns to allocate more thinking compute.
  • ...and 3 more figures