Revisiting Vulnerability Patch Identification on Data in the Wild

Ivana Clairine Irsan; Ratnadira Widyasari; Ting Zhang; Huihui Huang; Ferdian Thung; Yikun Li; Lwin Khin Shar; Eng Lieh Ouh; Hong Jin Kang; David Lo

Revisiting Vulnerability Patch Identification on Data in the Wild

Ivana Clairine Irsan, Ratnadira Widyasari, Ting Zhang, Huihui Huang, Ferdian Thung, Yikun Li, Lwin Khin Shar, Eng Lieh Ouh, Hong Jin Kang, David Lo

Abstract

Attacks can exploit zero-day or one-day vulnerabilities that are not publicly disclosed. To detect these vulnerabilities, security researchers monitor development activities in open-source repositories to identify unreported security patches. The sheer volume of commits makes this task infeasible to accomplish manually. Consequently, security patch detectors commonly trained and evaluated on security patches linked from vulnerability reports in the National Vulnerability Database (NVD). In this study, we assess the effectiveness of these detectors when applied in-the-wild. Our results show that models trained on NVD-derived data show substantially decreased performance, with decreases in F1-score of up to 90\% when tested on in-the-wild security patches, rendering them impractical for real-world use. An analysis comparing security patches identified in-the-wild and commits linked from NVD reveals that they can be easily distinguished from each other. Security patches associated with NVD have different distribution of commit messages, vulnerability types, and composition of changes. These differences suggest that NVD may be unsuitable as the \textit{sole} source of data for training models to detect security patches. We find that constructing a dataset that combines security patches from NVD data with a small subset of manually identified security patches can improve model robustness.

Revisiting Vulnerability Patch Identification on Data in the Wild

Abstract

Paper Structure (28 sections, 4 equations, 6 figures, 14 tables)

This paper contains 28 sections, 4 equations, 6 figures, 14 tables.

Introduction
Background
Security Patch Detection Approaches
Security Patch Benchmark Based on NVD Entries
Assessing the Performance of Research Tools Under Realistic Settings
Security Patches In the Wild
Experimental Setup
Overview
Data
Evaluation Metrics
Results
RQ1: Do models trained only on publicly disclosed security patches generalize to undisclosed security fixes?
RQ2: How do the unreported security patches differ from security patches linked from NVD?
Confirming that security patches from MoreFixes and JavaVFC come from different distributions
Analysis of the commit messages
...and 13 more sections

Figures (6)

Figure 1: Overview of the realistic evaluation of the models trained on NVD-linked security patches. We assess their performance by applying them to in-the-wild patches derived from open-source repositories
Figure 2: Perplexity scores for the MoreFixes and JavaVFC datasets to measure CodeBERT's familiarity with the data.
Figure 3: Perplexity scores for MoreFixes vs JavaVFC datasets in intra-project setting.
Figure 4: Example of a merge commit linked from NVD that integrates the develop branch into a release branch. This commit changes over a hundred files. The actual security fix (made in a single commit https://github.com/nocodb/nocodb/pull/2495/commits/ac346945f6cd4d1e371d57267d80f6dfdbbcc605) is a small subset of all changes (composed of over 180 commits) made in the merge commit.
Figure 5: Example of a code change made to implement a feature, unrelated to the vulnerability fix.
...and 1 more figures

Revisiting Vulnerability Patch Identification on Data in the Wild

Abstract

Revisiting Vulnerability Patch Identification on Data in the Wild

Authors

Abstract

Table of Contents

Figures (6)