Table of Contents
Fetching ...

Context-Enhanced Vulnerability Detection Based on Large Language Model

Yixin Yang, Bowen Xu, Xiang Gao, Hailong Sun

TL;DR

The paper tackles the challenge of detecting inter-procedural memory-safety vulnerabilities by combining program analysis with large language models. It introduces PacVD, a context-enhanced vulnerability detection framework that uses Code Property Graphs and Primitive API Abstraction to provide LLMs with relevant cross-function context, and systematically studies how abstraction level and prompting strategies affect performance across multiple models. Empirical results show that a mid-to-high level API abstraction (A3) generally yields the best balance between information richness and noise, with prompting strategies like in-context learning and chain-of-thought enhancing detection for higher abstraction levels. Across models, PacVD outperforms baselines, achieving higher F1 scores and better precision-recall balance, and demonstrating practical value for software security workflows. The work also outlines practical guidelines for API abstraction selection, model-strategy pairing, and prompt design, and highlights directions for future research in multi-modal inputs, language diversity, and deployment considerations.

Abstract

Vulnerability detection is a critical aspect of software security. Accurate detection is essential to prevent potential security breaches and protect software systems from malicious attacks. Recently, vulnerability detection methods leveraging deep learning and large language models (LLMs) have garnered increasing attention. However, existing approaches often focus on analyzing individual files or functions, which limits their ability to gather sufficient contextual information. Analyzing entire repositories to gather context introduces significant noise and computational overhead. To address these challenges, we propose a context-enhanced vulnerability detection approach that combines program analysis with LLMs. Specifically, we use program analysis to extract contextual information at various levels of abstraction, thereby filtering out irrelevant noise. The abstracted context along with source code are provided to LLM for vulnerability detection. We investigate how different levels of contextual granularity improve LLM-based vulnerability detection performance. Our goal is to strike a balance between providing sufficient detail to accurately capture vulnerabilities and minimizing unnecessary complexity that could hinder model performance. Based on an extensive study using GPT-4, DeepSeek, and CodeLLaMA with various prompting strategies, our key findings includes: (1) incorporating abstracted context significantly enhances vulnerability detection effectiveness; (2) different models benefit from distinct levels of abstraction depending on their code understanding capabilities; and (3) capturing program behavior through program analysis for general LLM-based code analysis tasks can be a direction that requires further attention.

Context-Enhanced Vulnerability Detection Based on Large Language Model

TL;DR

The paper tackles the challenge of detecting inter-procedural memory-safety vulnerabilities by combining program analysis with large language models. It introduces PacVD, a context-enhanced vulnerability detection framework that uses Code Property Graphs and Primitive API Abstraction to provide LLMs with relevant cross-function context, and systematically studies how abstraction level and prompting strategies affect performance across multiple models. Empirical results show that a mid-to-high level API abstraction (A3) generally yields the best balance between information richness and noise, with prompting strategies like in-context learning and chain-of-thought enhancing detection for higher abstraction levels. Across models, PacVD outperforms baselines, achieving higher F1 scores and better precision-recall balance, and demonstrating practical value for software security workflows. The work also outlines practical guidelines for API abstraction selection, model-strategy pairing, and prompt design, and highlights directions for future research in multi-modal inputs, language diversity, and deployment considerations.

Abstract

Vulnerability detection is a critical aspect of software security. Accurate detection is essential to prevent potential security breaches and protect software systems from malicious attacks. Recently, vulnerability detection methods leveraging deep learning and large language models (LLMs) have garnered increasing attention. However, existing approaches often focus on analyzing individual files or functions, which limits their ability to gather sufficient contextual information. Analyzing entire repositories to gather context introduces significant noise and computational overhead. To address these challenges, we propose a context-enhanced vulnerability detection approach that combines program analysis with LLMs. Specifically, we use program analysis to extract contextual information at various levels of abstraction, thereby filtering out irrelevant noise. The abstracted context along with source code are provided to LLM for vulnerability detection. We investigate how different levels of contextual granularity improve LLM-based vulnerability detection performance. Our goal is to strike a balance between providing sufficient detail to accurately capture vulnerabilities and minimizing unnecessary complexity that could hinder model performance. Based on an extensive study using GPT-4, DeepSeek, and CodeLLaMA with various prompting strategies, our key findings includes: (1) incorporating abstracted context significantly enhances vulnerability detection effectiveness; (2) different models benefit from distinct levels of abstraction depending on their code understanding capabilities; and (3) capturing program behavior through program analysis for general LLM-based code analysis tasks can be a direction that requires further attention.

Paper Structure

This paper contains 54 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Primitive API Abstraction and Context-Enhanced Vulnerability Detection Framework
  • Figure 2: F1 score for vulnerability detection using different API abstraction levels as auxiliary information. The horizontal axis represents different levels of API abstraction. The vertical axis indicates the F1-score of the model for vulnerability detection. The curves in different colors represent distinct prompt engineering strategies. BP denotes basic prompt, RP denotes role-based prompt, CT denotes Chain-of-thought, IC denotes In-context Learning, FR denotes few-shot learning with random selected examples, FC denotes few-shot learning with contrastive pair examples. Model Average represents the average across all prompt strategies of a specific model.
  • Figure 3: The effect of different API abstraction levels on different vulnerability types, Bar height represents F1 score. Higher is better. Arrows ($\uparrow$) indicate significant improvements over No API baseline.
  • Figure 4: F1-score comparison across different vulnerability detection models.Star symbols ($\bigstar$) denote the top-performing model for each API abstraction level.