Just-in-Time Detection of Silent Security Patches

Xunzhu Tang; Zhenghan Chen; Kisub Kim; Haoye Tian; Saad Ezzini; Jacques Klein

Just-in-Time Detection of Silent Security Patches

Xunzhu Tang, Zhenghan Chen, Kisub Kim, Haoye Tian, Saad Ezzini, Jacques Klein

Abstract

Open-source code is pervasive. In this setting, embedded vulnerabilities are spreading to downstream software at an alarming rate. While such vulnerabilities are generally identified and addressed rapidly, inconsistent maintenance policies may lead security patches to go unnoticed. Indeed, security patches can be {\em silent}, i.e., they do not always come with comprehensive advisories such as CVEs. This lack of transparency leaves users oblivious to available security updates, providing ample opportunity for attackers to exploit unpatched vulnerabilities. Consequently, identifying silent security patches just in time when they are released is essential for preventing n-day attacks, and for ensuring robust and secure maintenance practices. With LLMDA we propose to (1) leverage large language models (LLMs) to augment patch information with generated code change explanations, (2) design a representation learning approach that explores code-text alignment methodologies for feature combination, (3) implement a label-wise training with labelled instructions for guiding the embedding based on security relevance, and (4) rely on a probabilistic batch contrastive learning mechanism for building a high-precision identifier of security patches. We evaluate LLMDA on the PatchDB and SPI-DB literature datasets and show that our approach substantially improves over the state-of-the-art, notably GraphSPD by 20% in terms of F-Measure on the SPI-DB benchmark.

Just-in-Time Detection of Silent Security Patches

Abstract

Paper Structure (32 sections, 12 equations, 5 figures, 7 tables)

This paper contains 32 sections, 12 equations, 5 figures, 7 tables.

Introduction
The llmda approach
Data augmentation with LLMs
Generation of bimodal input embeddings
PT-Former: Embeddings alignment and Concatenation
Stochastic Batch Contrastive Learning (SBCL)
Prediction and Training Layer for Security Patch Detection
Experimental Setup
Research Questions
Datasets
Evaluation Metrics
Baseline Methods
Implementation
Experiment Results
Overall performance of llmda
...and 17 more sections

Figures (5)

Figure 1: Overview of llmda
Figure 2: Architecture of PT-Former.
Figure 3: Overview of our SBCL layer.
Figure 4: Illustration of embedding subspaces of security/non-security patches for contrastive learning
Figure 5: PCA visualizations of security and non-security patch embeddings by GraphsSPDand llmda.

Just-in-Time Detection of Silent Security Patches

Abstract

Just-in-Time Detection of Silent Security Patches

Authors

Abstract

Table of Contents

Figures (5)