VFDelta: A Framework for Detecting Silent Vulnerability Fixes by Enhancing Code Change Learning
Xu Yang, Shaowei Wang, Jiayuan Zhou, Xing Hu
TL;DR
Vulnerability fixes in OSS are often fixed silently, creating a detection challenge before public disclosure. VFDelta addresses this by learning a fine-grained code-change representation through two independent embeddings of the code before and after changes, combined via element-wise subtraction, and trained end-to-end with a file-level classifier to infer commit-level vulnerability fixes. The framework leverages context-reserved changes and joint optimization to outperform state-of-the-art baselines on both cross-project and temporal datasets, with notable gains in F1 and inspection-effort metrics. Its results, including a strong temporal performance and ablation validation, indicate practical value for proactive vulnerability management and potential applicability to other code-change tasks such as defect prediction and commit-message generation.
Abstract
Vulnerability fixes in open source software (OSS) usually follow the coordinated vulnerability disclosure model and are silently fixed. This delay can expose OSS users to risks as malicious parties might exploit the software before fixes are publicly known. Therefore, it is important to identify vulnerability fixes early and automatically. Existing methods classify vulnerability fixes by learning code change representations from commits, typically by concatenating code changes, which does not effectively highlight nuanced differences. Additionally, previous approaches fine-tune code embedding models and classification models separately, which limits overall effectiveness. We propose VFDelta, a lightweight yet effective framework that embeds code before and after changes using independent models with surrounding code as context. By performing element-wise subtraction on these embeddings, we capture fine-grain changes. Our architecture allows joint training of embedding and classification models, optimizing overall performance. Experiments demonstrate that VFDelta achieves up to 0.33 F1 score and 0.63 CostEffort@5, improving over state-of-the-art methods by 77.4% and 7.1%, respectively. Ablation analysis confirms the importance of our code change representation in capturing small changes. We also expanded the dataset and introduced a temporal split to simulate real-world scenarios; VFDelta significantly outperforms baselines VulFixMiner and MiDas across all metrics in this setting.
