Context-Enhanced Vulnerability Detection Based on Large Language Model
Yixin Yang, Bowen Xu, Xiang Gao, Hailong Sun
TL;DR
The paper tackles the challenge of detecting inter-procedural memory-safety vulnerabilities by combining program analysis with large language models. It introduces PacVD, a context-enhanced vulnerability detection framework that uses Code Property Graphs and Primitive API Abstraction to provide LLMs with relevant cross-function context, and systematically studies how abstraction level and prompting strategies affect performance across multiple models. Empirical results show that a mid-to-high level API abstraction (A3) generally yields the best balance between information richness and noise, with prompting strategies like in-context learning and chain-of-thought enhancing detection for higher abstraction levels. Across models, PacVD outperforms baselines, achieving higher F1 scores and better precision-recall balance, and demonstrating practical value for software security workflows. The work also outlines practical guidelines for API abstraction selection, model-strategy pairing, and prompt design, and highlights directions for future research in multi-modal inputs, language diversity, and deployment considerations.
Abstract
Vulnerability detection is a critical aspect of software security. Accurate detection is essential to prevent potential security breaches and protect software systems from malicious attacks. Recently, vulnerability detection methods leveraging deep learning and large language models (LLMs) have garnered increasing attention. However, existing approaches often focus on analyzing individual files or functions, which limits their ability to gather sufficient contextual information. Analyzing entire repositories to gather context introduces significant noise and computational overhead. To address these challenges, we propose a context-enhanced vulnerability detection approach that combines program analysis with LLMs. Specifically, we use program analysis to extract contextual information at various levels of abstraction, thereby filtering out irrelevant noise. The abstracted context along with source code are provided to LLM for vulnerability detection. We investigate how different levels of contextual granularity improve LLM-based vulnerability detection performance. Our goal is to strike a balance between providing sufficient detail to accurately capture vulnerabilities and minimizing unnecessary complexity that could hinder model performance. Based on an extensive study using GPT-4, DeepSeek, and CodeLLaMA with various prompting strategies, our key findings includes: (1) incorporating abstracted context significantly enhances vulnerability detection effectiveness; (2) different models benefit from distinct levels of abstraction depending on their code understanding capabilities; and (3) capturing program behavior through program analysis for general LLM-based code analysis tasks can be a direction that requires further attention.
