Vul-R2: A Reasoning LLM for Automated Vulnerability Repair
Xin-Cheng Wen, Zirui Lin, Yijun Yang, Cuiyun Gao, Deheng Ye
TL;DR
Vul-R2 introduces a reasoning-based LLM framework for automated vulnerability repair, addressing the lack of vulnerability-specific reasoning data and the absence of verifiable intermediate feedback during training. It combines Domain-Aware Reasoning Learning (DARL) with Curriculum-based Verifiable Rewarded Training (CVRT), featuring Reasoning Answer Construction (RAC), data filtering, and domain-aware SFT, followed by a two-stage RLVR process that progressively strengthens verifiable reasoning. Experimental results on PrimeVul and SVEN show Vul-R2 achieving state-of-the-art performance across EM and CodeBLEU, repairing significantly more vulnerabilities than CodePTMs and standard LLM baselines, and demonstrating robustness across vulnerability types. The findings highlight the importance of explicit vulnerability-domain knowledge and structured, verifiable reasoning in enabling effective AVR with real-world impact.
Abstract
The exponential increase in software vulnerabilities has created an urgent need for automatic vulnerability repair (AVR) solutions. Recent research has formulated AVR as a sequence generation problem and has leveraged large language models (LLMs) to address this problem. Typically, these approaches prompt or fine-tune LLMs to generate repairs for vulnerabilities directly. Although these methods show state-of-the-art performance, they face the following challenges: (1) Lack of high-quality, vulnerability-related reasoning data. Current approaches primarily rely on foundation models that mainly encode general programming knowledge. Without vulnerability-related reasoning data, they tend to fail to capture the diverse vulnerability repair patterns. (2) Hard to verify the intermediate vulnerability repair process during LLM training. Existing reinforcement learning methods often leverage intermediate execution feedback from the environment (e.g., sandbox-based execution results) to guide reinforcement learning training. In contrast, the vulnerability repair process generally lacks such intermediate, verifiable feedback, which poses additional challenges for model training.
