Table of Contents
Fetching ...

VulEval: Towards Repository-Level Evaluation of Software Vulnerability Detection

Xin-Cheng Wen, Xinchen Wang, Yujia Chen, Ruida Hu, David Lo, Cuiyun Gao

TL;DR

VulEval addresses the gap in evaluating software vulnerability detection across inter- and intra-procedural scopes by introducing a repository-level evaluation framework and a large CVE-driven dataset. It integrates three tasks—function-level vulnerability detection, vulnerability-related dependency retrieval, and repository-level vulnerability detection—to reflect real-world developer workflows. Empirical results show that incorporating repository-context and dependencies improves detection, with larger models and LLMs offering the most benefit for repository-level tasks, while retrieval quality remains a bottleneck. The work highlights practical implications for vulnerability tooling and outlines directions for future research in dependency retrieval and prompt-based integration.

Abstract

Deep Learning (DL)-based methods have proven to be effective for software vulnerability detection, with a potential for substantial productivity enhancements for detecting vulnerabilities. Current methods mainly focus on detecting single functions (i.e., intra-procedural vulnerabilities), ignoring the more complex inter-procedural vulnerability detection scenarios in practice. For example, developers routinely engage with program analysis to detect vulnerabilities that span multiple functions within repositories. In addition, the widely-used benchmark datasets generally contain only intra-procedural vulnerabilities, leaving the assessment of inter-procedural vulnerability detection capabilities unexplored. To mitigate the issues, we propose a repository-level evaluation system, named \textbf{VulEval}, aiming at evaluating the detection performance of inter- and intra-procedural vulnerabilities simultaneously. Specifically, VulEval consists of three interconnected evaluation tasks: \textbf{(1) Function-Level Vulnerability Detection}, aiming at detecting intra-procedural vulnerability given a code snippet; \textbf{(2) Vulnerability-Related Dependency Prediction}, aiming at retrieving the most relevant dependencies from call graphs for providing developers with explanations about the vulnerabilities; and \textbf{(3) Repository-Level Vulnerability Detection}, aiming at detecting inter-procedural vulnerabilities by combining with the dependencies identified in the second task. VulEval also consists of a large-scale dataset, with a total of 4,196 CVE entries, 232,239 functions, and corresponding 4,699 repository-level source code in C/C++ programming languages. Our analysis highlights the current progress and future directions for software vulnerability detection.

VulEval: Towards Repository-Level Evaluation of Software Vulnerability Detection

TL;DR

VulEval addresses the gap in evaluating software vulnerability detection across inter- and intra-procedural scopes by introducing a repository-level evaluation framework and a large CVE-driven dataset. It integrates three tasks—function-level vulnerability detection, vulnerability-related dependency retrieval, and repository-level vulnerability detection—to reflect real-world developer workflows. Empirical results show that incorporating repository-context and dependencies improves detection, with larger models and LLMs offering the most benefit for repository-level tasks, while retrieval quality remains a bottleneck. The work highlights practical implications for vulnerability tooling and outlines directions for future research in dependency retrieval and prompt-based integration.

Abstract

Deep Learning (DL)-based methods have proven to be effective for software vulnerability detection, with a potential for substantial productivity enhancements for detecting vulnerabilities. Current methods mainly focus on detecting single functions (i.e., intra-procedural vulnerabilities), ignoring the more complex inter-procedural vulnerability detection scenarios in practice. For example, developers routinely engage with program analysis to detect vulnerabilities that span multiple functions within repositories. In addition, the widely-used benchmark datasets generally contain only intra-procedural vulnerabilities, leaving the assessment of inter-procedural vulnerability detection capabilities unexplored. To mitigate the issues, we propose a repository-level evaluation system, named \textbf{VulEval}, aiming at evaluating the detection performance of inter- and intra-procedural vulnerabilities simultaneously. Specifically, VulEval consists of three interconnected evaluation tasks: \textbf{(1) Function-Level Vulnerability Detection}, aiming at detecting intra-procedural vulnerability given a code snippet; \textbf{(2) Vulnerability-Related Dependency Prediction}, aiming at retrieving the most relevant dependencies from call graphs for providing developers with explanations about the vulnerabilities; and \textbf{(3) Repository-Level Vulnerability Detection}, aiming at detecting inter-procedural vulnerabilities by combining with the dependencies identified in the second task. VulEval also consists of a large-scale dataset, with a total of 4,196 CVE entries, 232,239 functions, and corresponding 4,699 repository-level source code in C/C++ programming languages. Our analysis highlights the current progress and future directions for software vulnerability detection.
Paper Structure (37 sections, 3 equations, 8 figures, 4 tables)

This paper contains 37 sections, 3 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: An inter-procedural vulnerability example of the CWE-20. Lines highlighted in green denote the call relation (i.e., callee and caller), and red denotes the vulnerable statements.
  • Figure 2: The four types of vulnerability detection methods.
  • Figure 3: The overview of VulEval. Figure (a), (b), (c), and (d) denote the process of data collection, function-level vulnerability detection, vulnerability-related dependency prediction, and repository-level vulnerability detection, respectively.
  • Figure 4: CWE-190
  • Figure 5: CWE-400
  • ...and 3 more figures