Table of Contents
Fetching ...

VFDelta: A Framework for Detecting Silent Vulnerability Fixes by Enhancing Code Change Learning

Xu Yang, Shaowei Wang, Jiayuan Zhou, Xing Hu

TL;DR

Vulnerability fixes in OSS are often fixed silently, creating a detection challenge before public disclosure. VFDelta addresses this by learning a fine-grained code-change representation through two independent embeddings of the code before and after changes, combined via element-wise subtraction, and trained end-to-end with a file-level classifier to infer commit-level vulnerability fixes. The framework leverages context-reserved changes and joint optimization to outperform state-of-the-art baselines on both cross-project and temporal datasets, with notable gains in F1 and inspection-effort metrics. Its results, including a strong temporal performance and ablation validation, indicate practical value for proactive vulnerability management and potential applicability to other code-change tasks such as defect prediction and commit-message generation.

Abstract

Vulnerability fixes in open source software (OSS) usually follow the coordinated vulnerability disclosure model and are silently fixed. This delay can expose OSS users to risks as malicious parties might exploit the software before fixes are publicly known. Therefore, it is important to identify vulnerability fixes early and automatically. Existing methods classify vulnerability fixes by learning code change representations from commits, typically by concatenating code changes, which does not effectively highlight nuanced differences. Additionally, previous approaches fine-tune code embedding models and classification models separately, which limits overall effectiveness. We propose VFDelta, a lightweight yet effective framework that embeds code before and after changes using independent models with surrounding code as context. By performing element-wise subtraction on these embeddings, we capture fine-grain changes. Our architecture allows joint training of embedding and classification models, optimizing overall performance. Experiments demonstrate that VFDelta achieves up to 0.33 F1 score and 0.63 CostEffort@5, improving over state-of-the-art methods by 77.4% and 7.1%, respectively. Ablation analysis confirms the importance of our code change representation in capturing small changes. We also expanded the dataset and introduced a temporal split to simulate real-world scenarios; VFDelta significantly outperforms baselines VulFixMiner and MiDas across all metrics in this setting.

VFDelta: A Framework for Detecting Silent Vulnerability Fixes by Enhancing Code Change Learning

TL;DR

Vulnerability fixes in OSS are often fixed silently, creating a detection challenge before public disclosure. VFDelta addresses this by learning a fine-grained code-change representation through two independent embeddings of the code before and after changes, combined via element-wise subtraction, and trained end-to-end with a file-level classifier to infer commit-level vulnerability fixes. The framework leverages context-reserved changes and joint optimization to outperform state-of-the-art baselines on both cross-project and temporal datasets, with notable gains in F1 and inspection-effort metrics. Its results, including a strong temporal performance and ablation validation, indicate practical value for proactive vulnerability management and potential applicability to other code-change tasks such as defect prediction and commit-message generation.

Abstract

Vulnerability fixes in open source software (OSS) usually follow the coordinated vulnerability disclosure model and are silently fixed. This delay can expose OSS users to risks as malicious parties might exploit the software before fixes are publicly known. Therefore, it is important to identify vulnerability fixes early and automatically. Existing methods classify vulnerability fixes by learning code change representations from commits, typically by concatenating code changes, which does not effectively highlight nuanced differences. Additionally, previous approaches fine-tune code embedding models and classification models separately, which limits overall effectiveness. We propose VFDelta, a lightweight yet effective framework that embeds code before and after changes using independent models with surrounding code as context. By performing element-wise subtraction on these embeddings, we capture fine-grain changes. Our architecture allows joint training of embedding and classification models, optimizing overall performance. Experiments demonstrate that VFDelta achieves up to 0.33 F1 score and 0.63 CostEffort@5, improving over state-of-the-art methods by 77.4% and 7.1%, respectively. Ablation analysis confirms the importance of our code change representation in capturing small changes. We also expanded the dataset and introduced a temporal split to simulate real-world scenarios; VFDelta significantly outperforms baselines VulFixMiner and MiDas across all metrics in this setting.
Paper Structure (35 sections, 6 figures, 5 tables)

This paper contains 35 sections, 6 figures, 5 tables.

Figures (6)

  • Figure 1: A sample file-level code changes for fixing CVE-2018-11776 Apache2018CVE vulnerability.
  • Figure 2: The workflow of VFDelta.
  • Figure 3: Comparison of different code change representation learning approaches.
  • Figure 4: Comparison of training loss between VFDelta, EmbedSubtract_Single and EmbedConcat_Duo.
  • Figure 5: The barplot denotes the effectiveness of different approaches on the groups of commits with different change sizes in terms of F1. The black line denotes the proportion of each group of commits. For instance, 46% indicates 46% of the commits have a code change size between 0 and 20.
  • ...and 1 more figures