Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG
Xueying Du, Geng Zheng, Kaixin Wang, Yi Zou, Yujia Wang, Wentai Deng, Jiayi Feng, Mingwei Liu, Bihuan Chen, Xin Peng, Tao Ma, Yiling Lou
TL;DR
This work shows that LLMs struggle to distinguish vulnerable code from patched variants due to shallow textual cues. It introduces Vul-RAG, a knowledge-level retrieval-augmented generation framework that distills multi-dimensional vulnerability knowledge from historical CVEs into a structured knowledge base and uses retrieval-guided reasoning to enhance vulnerability detection. Across large LLMs, Vul-RAG achieves substantial gains in pairwise accuracy and balanced metrics, and user studies indicate improved manual vulnerability understanding. A real-world Linux kernel case study demonstrates Vul-RAG's ability to uncover previously unknown bugs and contribute to patches, underscoring the practical value of knowledge-level guidance for software security.
Abstract
Although LLMs have shown promising potential in vulnerability detection, this study reveals their limitations in distinguishing between vulnerable and similar-but-benign patched code (only 0.06 - 0.14 accuracy). It shows that LLMs struggle to capture the root causes of vulnerabilities during vulnerability detection. To address this challenge, we propose enhancing LLMs with multi-dimensional vulnerability knowledge distilled from historical vulnerabilities and fixes. We design a novel knowledge-level Retrieval-Augmented Generation framework Vul-RAG, which improves LLMs with an accuracy increase of 16% - 24% in identifying vulnerable and patched code. Additionally, vulnerability knowledge generated by Vul-RAG can further (1) serve as high-quality explanations to improve manual detection accuracy (from 60% to 77%), and (2) detect 10 previously-unknown bugs in the recent Linux kernel release with 6 assigned CVEs.
