Table of Contents
Fetching ...

Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

Xueying Du, Geng Zheng, Kaixin Wang, Yi Zou, Yujia Wang, Wentai Deng, Jiayi Feng, Mingwei Liu, Bihuan Chen, Xin Peng, Tao Ma, Yiling Lou

TL;DR

This work shows that LLMs struggle to distinguish vulnerable code from patched variants due to shallow textual cues. It introduces Vul-RAG, a knowledge-level retrieval-augmented generation framework that distills multi-dimensional vulnerability knowledge from historical CVEs into a structured knowledge base and uses retrieval-guided reasoning to enhance vulnerability detection. Across large LLMs, Vul-RAG achieves substantial gains in pairwise accuracy and balanced metrics, and user studies indicate improved manual vulnerability understanding. A real-world Linux kernel case study demonstrates Vul-RAG's ability to uncover previously unknown bugs and contribute to patches, underscoring the practical value of knowledge-level guidance for software security.

Abstract

Although LLMs have shown promising potential in vulnerability detection, this study reveals their limitations in distinguishing between vulnerable and similar-but-benign patched code (only 0.06 - 0.14 accuracy). It shows that LLMs struggle to capture the root causes of vulnerabilities during vulnerability detection. To address this challenge, we propose enhancing LLMs with multi-dimensional vulnerability knowledge distilled from historical vulnerabilities and fixes. We design a novel knowledge-level Retrieval-Augmented Generation framework Vul-RAG, which improves LLMs with an accuracy increase of 16% - 24% in identifying vulnerable and patched code. Additionally, vulnerability knowledge generated by Vul-RAG can further (1) serve as high-quality explanations to improve manual detection accuracy (from 60% to 77%), and (2) detect 10 previously-unknown bugs in the recent Linux kernel release with 6 assigned CVEs.

Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

TL;DR

This work shows that LLMs struggle to distinguish vulnerable code from patched variants due to shallow textual cues. It introduces Vul-RAG, a knowledge-level retrieval-augmented generation framework that distills multi-dimensional vulnerability knowledge from historical CVEs into a structured knowledge base and uses retrieval-guided reasoning to enhance vulnerability detection. Across large LLMs, Vul-RAG achieves substantial gains in pairwise accuracy and balanced metrics, and user studies indicate improved manual vulnerability understanding. A real-world Linux kernel case study demonstrates Vul-RAG's ability to uncover previously unknown bugs and contribute to patches, underscoring the practical value of knowledge-level guidance for software security.

Abstract

Although LLMs have shown promising potential in vulnerability detection, this study reveals their limitations in distinguishing between vulnerable and similar-but-benign patched code (only 0.06 - 0.14 accuracy). It shows that LLMs struggle to capture the root causes of vulnerabilities during vulnerability detection. To address this challenge, we propose enhancing LLMs with multi-dimensional vulnerability knowledge distilled from historical vulnerabilities and fixes. We design a novel knowledge-level Retrieval-Augmented Generation framework Vul-RAG, which improves LLMs with an accuracy increase of 16% - 24% in identifying vulnerable and patched code. Additionally, vulnerability knowledge generated by Vul-RAG can further (1) serve as high-quality explanations to improve manual detection accuracy (from 60% to 77%), and (2) detect 10 previously-unknown bugs in the recent Linux kernel release with 6 assigned CVEs.
Paper Structure (35 sections, 2 equations, 7 figures, 8 tables)

This paper contains 35 sections, 2 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Overview of Vul-RAG
  • Figure 2: An Example of Vulnerability Knowledge Extraction from CVE-2022-38457
  • Figure 3: Comparison of performance for Vul-RAG and Baselines
  • Figure 4: Examples of vulnerable code and similar-but-benign patched code.
  • Figure 5: An example of vulnerability knowledge representation
  • ...and 2 more figures