On the Effectiveness of Function-Level Vulnerability Detectors for Inter-Procedural Vulnerabilities
Zhen Li, Ning Wang, Deqing Zou, Yating Li, Ruqian Zhang, Shouhuai Xu, Chao Zhang, Hai Jin
TL;DR
The study addresses the gap where function-level vulnerability detectors fail to capture inter-procedural vulnerabilities, introducing VulTrigger to identify vulnerability-triggering statements via patch analysis and inter-procedural program slicing. It builds InterPVD, a $769$-vulnerability dataset spanning $53$ projects and $16$ CWEs, revealing that $24.3\%$ are inter-procedural with an average of $2.8$ involvement layers. Evaluations across five detectors show significantly reduced effectiveness for vulnerability-triggering detection in inter-procedural contexts, highlighting the need for slice- and patch-informed approaches. The work provides a new benchmark and tooling that improve understanding and detection of cross-function vulnerabilities, with practical implications for more robust software security.
Abstract
Software vulnerabilities are a major cyber threat and it is important to detect them. One important approach to detecting vulnerabilities is to use deep learning while treating a program function as a whole, known as function-level vulnerability detectors. However, the limitation of this approach is not understood. In this paper, we investigate its limitation in detecting one class of vulnerabilities known as inter-procedural vulnerabilities, where the to-be-patched statements and the vulnerability-triggering statements belong to different functions. For this purpose, we create the first Inter-Procedural Vulnerability Dataset (InterPVD) based on C/C++ open-source software, and we propose a tool dubbed VulTrigger for identifying vulnerability-triggering statements across functions. Experimental results show that VulTrigger can effectively identify vulnerability-triggering statements and inter-procedural vulnerabilities. Our findings include: (i) inter-procedural vulnerabilities are prevalent with an average of 2.8 inter-procedural layers; and (ii) function-level vulnerability detectors are much less effective in detecting to-be-patched functions of inter-procedural vulnerabilities than detecting their counterparts of intra-procedural vulnerabilities.
