CompVPD: Iteratively Identifying Vulnerability Patches Based on Human Validation Results with a Precise Context
Tianyu Chen, Lin Li, Taotao Qian, Jingyi Liu, Wei Yang, Ding Li, Guangtai Liang, Qianxiang Wang, Tao Xie
TL;DR
CompVPD tackles the problem of timely vulnerability patch adoption in open-source software by introducing a precise patch-context generation pipeline and an iterative, human-validated identification framework. It presents two novel algorithms—multi-granularity slicing and adaptive context-expanding—to produce reduced, context-rich representations of code commits, enabling more accurate vulnerability-patch detection when finetuned with limited human validation. The approach, evaluated on the VulFix dataset, achieves a significant improvement in F1 score over state-of-the-art baselines and demonstrates practical value by identifying multiple patches and high-risk fixes across real-world OSS projects. The iterative identification loop further boosts patch discovery under the same human effort, and the method remains efficient, processing commits in under half a second. Overall, CompVPD offers a scalable, effective solution for security practitioners to rapidly identify and validate vulnerability patches in large codebases, with substantial implications for OSS security practices.
Abstract
Applying security patches in open source software timely is critical for ensuring the security of downstream applications. However, it is challenging to apply these patches promptly because notifications of patches are often incomplete and delayed. To address this issue, existing approaches employ deep-learning (DL) models to identify additional vulnerability patches by determining whether a code commit addresses a vulnerability. Nonetheless, these approaches suffer from low accuracy due to the imprecise context provided for the patches. To provide precise context for patches, we propose a multi-granularity slicing algorithm and an adaptive-expanding algorithm to accurately identify code related to the patches. Additionally, the precise context enables to design an iterative identification framework, CompVPD, which utilizes the human validation results, and substantially improve the effectiveness. We empirically compare CompVPD with four state-of-the-art/practice (SOTA) approaches in identifying vulnerability patches. The results demonstrate that CompVPD improves the F1 score by 20% compared to the best scores of the SOTA approaches. Additionally, CompVPD contributes to security practice by helping identify 20 vulnerability patches and 18 fixes for high-risk bugs from 2,500 recent code commits in five highly popular open-source projects.
