LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models

Mohamad Fakih; Rahul Dharmaji; Halima Bouzidi; Gustavo Quiros Araya; Oluwatosin Ogundare; Mohammad Abdullah Al Faruque

LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models

Mohamad Fakih, Rahul Dharmaji, Halima Bouzidi, Gustavo Quiros Araya, Oluwatosin Ogundare, Mohammad Abdullah Al Faruque

TL;DR

The paper tackles the challenge of quickly repairing real-world software vulnerabilities in legacy code by introducing LLM4CVE, an automated, iterative vulnerability repair pipeline that leverages Large Language Models with LoRA fine-tuning and prompt engineering. It integrates CVE/CWE context, CodeBLEU-driven feedback, and a three-stage repair loop to generate, evaluate, and refine patches, tested across GPT-3.5, GPT-4o, and Llama 3 variants on the CVEFixes dataset. Key contributions include a novel automated iterative repair method, a comprehensive evaluation showing improved CodeBLEU scores and human-judged patch quality, and a public release of testing apparatus, fine-tuned weights, and data. The approach promises significant practical impact by reducing engineering effort and enabling faster vulnerability remediation in legacy and safety-critical software systems.

Abstract

Software vulnerabilities continue to be ubiquitous, even in the era of AI-powered code assistants, advanced static analysis tools, and the adoption of extensive testing frameworks. It has become apparent that we must not simply prevent these bugs, but also eliminate them in a quick, efficient manner. Yet, human code intervention is slow, costly, and can often lead to further security vulnerabilities, especially in legacy codebases. The advent of highly advanced Large Language Models (LLM) has opened up the possibility for many software defects to be patched automatically. We propose LLM4CVE an LLM-based iterative pipeline that robustly fixes vulnerable functions in real-world code with high accuracy. We examine our pipeline with State-of-the-Art LLMs, such as GPT-3.5, GPT-4o, Llama 38B, and Llama 3 70B. We achieve a human-verified quality score of 8.51/10 and an increase in groundtruth code similarity of 20% with Llama 3 70B. To promote further research in the area of LLM-based vulnerability repair, we publish our testing apparatus, fine-tuned weights, and experimental data on our website

LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models

TL;DR

Abstract

Paper Structure (43 sections, 6 figures, 5 tables)

This paper contains 43 sections, 6 figures, 5 tables.

Introduction
Background
CVEs & CWEs
Vulnerability Analysis & Repair
Large Language Models
LLM Augmentation
Prompt Engineering
Related Works
Classical Automated Vulnerability Detection Repair
LLM-Driven Code Generation
LLM-Guided Vulnerability Detection and Repair
Methodology
CVE Selection
Prompting
Prompt Engineering
...and 28 more sections

Figures (6)

Figure 1: How LLM4CVE assists in preventing bug exploitation
Figure 2: Rectification of software vulnerabilities often follows a predefined cycle, which LLM4CVE augments for faster turnaround times
Figure 3: LoRAs enable the fine-tuning of LLMs with a comparatively low computational cost
Figure 4: A visualization of how the LLM4CVE pipeline can automatically fix common software vulnerabilities
Figure 5: LLM4CVE uses iterative generation to improve the overall quality of patch synthesis
...and 1 more figures

LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models

TL;DR

Abstract

LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)