LLM-Enhanced Software Patch Localization

Jinhong Yu; Yi Chen; Di Tang; Xiaozhong Liu; XiaoFeng Wang; Chen Wu; Haixu Tang

LLM-Enhanced Software Patch Localization

Jinhong Yu, Yi Chen, Di Tang, Xiaozhong Liu, XiaoFeng Wang, Chen Wu, Haixu Tang

TL;DR

LLM-SPL is introduced, a recommendation-based SPL approach that leverages the capabilities of the Large Language Model (LLM) to locate the security patch commit for a given CVE, and a joint learning framework is proposed, in which the outputs of LLM serves as additional features to aid the recommendation model in prioritizing security patches.

Abstract

Open source software (OSS) is integral to modern product development, and any vulnerability within it potentially compromises numerous products. While developers strive to apply security patches, pinpointing these patches among extensive OSS updates remains a challenge. Security patch localization (SPL) recommendation methods are leading approaches to address this. However, existing SPL models often falter when a commit lacks a clear association with its corresponding CVE, and do not consider a scenario that a vulnerability has multiple patches proposed over time before it has been fully resolved. To address these challenges, we introduce LLM-SPL, a recommendation-based SPL approach that leverages the capabilities of the Large Language Model (LLM) to locate the security patch commit for a given CVE. More specifically, we propose a joint learning framework, in which the outputs of LLM serves as additional features to aid our recommendation model in prioritizing security patches. Our evaluation on a dataset of 1,915 CVEs associated with 2,461 patches demonstrates that LLM-SPL excels in ranking patch commits, surpassing the state-of-the-art method in terms of Recall, while significantly reducing manual effort. Notably, for vulnerabilities requiring multiple patches, LLM-SPL significantly improves Recall by 22.83\%, NDCG by 19.41\%, and reduces manual effort by over 25\% when checking up to the top 10 rankings. The dataset and source code are available at \url{https://anonymous.4open.science/r/LLM-SPL-91F8}.

LLM-Enhanced Software Patch Localization

TL;DR

Abstract

Paper Structure (22 sections, 4 equations, 17 figures, 11 tables)

This paper contains 22 sections, 4 equations, 17 figures, 11 tables.

Introduction
Background
Vulnerability and Patch
Security Patch Localization and VCMatch
Machine Learning
Challenges in Effective SPL
Challenge 1: Complexity of Content
Challenge 2: Inter-Commit Relations
A Potential Solution: Large Language Model
LLM Potential for Comprehension
LLM Potential for Relation Recognition
LLM Alone is Not Enough
LLM-SPL: Design and Implementation
Design
Feature Generation based on LLM
...and 7 more sections

Figures (17)

Figure 1: Unclear Association: CVE and Commit Example
Figure 2: Example of a vulnerability fixed by multiple patches collaboratively.
Figure 3: One-iteration feedback recommendation process.
Figure 4: Design of joint learning framework.
Figure 5: Architecture of LLM-SPL.
...and 12 more figures

LLM-Enhanced Software Patch Localization

TL;DR

Abstract

LLM-Enhanced Software Patch Localization

Authors

TL;DR

Abstract

Table of Contents

Figures (17)