LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward

Nafis Tanveer Islam; Joseph Khoury; Andrew Seong; Mohammad Bahrami Karkevandi; Gonzalo De La Torre Parra; Elias Bou-Harb; Peyman Najafirad

LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward

Nafis Tanveer Islam, Joseph Khoury, Andrew Seong, Mohammad Bahrami Karkevandi, Gonzalo De La Torre Parra, Elias Bou-Harb, Peyman Najafirad

TL;DR

SecRepair tackles the security gap in AI-assisted software development by enabling automatic vulnerability identification, repair, and descriptive analysis using an LLM-powered pipeline built on CodeGen2. It introduces InstructVul, an instruction-based dataset, and an RL-based semantic reward mechanism to generate concise, commit-ready code comments. The approach is validated on six Open Source IoT operating systems, with zero-day and N-day vulnerability findings, and shows improved vulnerability repair and description quality over baselines. These results demonstrate a practical pathway to educate developers about code security and reduce insecure code propagated by AI tools.

Abstract

In software development, the predominant emphasis on functionality often supersedes security concerns, a trend gaining momentum with AI-driven automation tools like GitHub Copilot. These tools significantly improve developers' efficiency in functional code development. Nevertheless, it remains a notable concern that such tools are also responsible for creating insecure code, predominantly because of pre-training on publicly available repositories with vulnerable code. Moreover, developers are called the "weakest link in the chain" since they have very minimal knowledge of code security. Although existing solutions provide a reasonable solution to vulnerable code, they must adequately describe and educate the developers on code security to ensure that the security issues are not repeated. Therefore we introduce a multipurpose code vulnerability analysis system \texttt{SecRepair}, powered by a large language model, CodeGen2 assisting the developer in identifying and generating fixed code along with a complete description of the vulnerability with a code comment. Our innovative methodology uses a reinforcement learning paradigm to generate code comments augmented by a semantic reward mechanism. Inspired by how humans fix code issues, we propose an instruction-based dataset suitable for vulnerability analysis with LLMs. We further identify zero-day and N-day vulnerabilities in 6 Open Source IoT Operating Systems on GitHub. Our findings underscore that incorporating reinforcement learning coupled with semantic reward augments our model's performance, thereby fortifying its capacity to address code vulnerabilities with improved efficacy.

LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward

TL;DR

Abstract

Paper Structure (20 sections, 3 equations, 4 figures, 4 tables)

This paper contains 20 sections, 3 equations, 4 figures, 4 tables.

Introduction
Related Work
Code Vulnerability with LLMs.
Vulnerability Repair.
Approach
Code Vulnerability Repair and Description
Reinforcement Learning for Code Comment
Instruction Dataset
Formal Definition
Dataset Creation
Input generation, ($I$, $C_i$).
Output Generation, ($y$).
Experiments and Discussions
Evaluation Metrics
BLEU Score:
...and 5 more sections

Figures (4)

Figure 1: An illustrative example showcasing the input-output dynamics of our instruction-style model. We present a comprehensive depiction of our model's prowess across four distinct tasks: vulnerability identification, repair, description, and code comment generation.
Figure 2: The overall architecture of SecRepair includes (1) Instruction Dataset Preparation: Creating a triplet of instruction, context input and output, (2) Code Repair and Detection: Trains the model for vulnerable code identification, repair, and description, (3) Fine Tuning using Reinforcement Learning: Reinforcement learning with semantic reward to further fine-tune the model for code comment generation.
Figure 3: Ablation Study on the Performance of our Model with varying temperature
Figure 4: Ablation Study on the Performance of our Model with Varying Beam Size

LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward

TL;DR

Abstract

LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward

Authors

TL;DR

Abstract

Table of Contents

Figures (4)