Table of Contents
Fetching ...

LLM-Enhanced Software Patch Localization

Jinhong Yu, Yi Chen, Di Tang, Xiaozhong Liu, XiaoFeng Wang, Chen Wu, Haixu Tang

TL;DR

LLM-SPL is introduced, a recommendation-based SPL approach that leverages the capabilities of the Large Language Model (LLM) to locate the security patch commit for a given CVE, and a joint learning framework is proposed, in which the outputs of LLM serves as additional features to aid the recommendation model in prioritizing security patches.

Abstract

Open source software (OSS) is integral to modern product development, and any vulnerability within it potentially compromises numerous products. While developers strive to apply security patches, pinpointing these patches among extensive OSS updates remains a challenge. Security patch localization (SPL) recommendation methods are leading approaches to address this. However, existing SPL models often falter when a commit lacks a clear association with its corresponding CVE, and do not consider a scenario that a vulnerability has multiple patches proposed over time before it has been fully resolved. To address these challenges, we introduce LLM-SPL, a recommendation-based SPL approach that leverages the capabilities of the Large Language Model (LLM) to locate the security patch commit for a given CVE. More specifically, we propose a joint learning framework, in which the outputs of LLM serves as additional features to aid our recommendation model in prioritizing security patches. Our evaluation on a dataset of 1,915 CVEs associated with 2,461 patches demonstrates that LLM-SPL excels in ranking patch commits, surpassing the state-of-the-art method in terms of Recall, while significantly reducing manual effort. Notably, for vulnerabilities requiring multiple patches, LLM-SPL significantly improves Recall by 22.83\%, NDCG by 19.41\%, and reduces manual effort by over 25\% when checking up to the top 10 rankings. The dataset and source code are available at \url{https://anonymous.4open.science/r/LLM-SPL-91F8}.

LLM-Enhanced Software Patch Localization

TL;DR

LLM-SPL is introduced, a recommendation-based SPL approach that leverages the capabilities of the Large Language Model (LLM) to locate the security patch commit for a given CVE, and a joint learning framework is proposed, in which the outputs of LLM serves as additional features to aid the recommendation model in prioritizing security patches.

Abstract

Open source software (OSS) is integral to modern product development, and any vulnerability within it potentially compromises numerous products. While developers strive to apply security patches, pinpointing these patches among extensive OSS updates remains a challenge. Security patch localization (SPL) recommendation methods are leading approaches to address this. However, existing SPL models often falter when a commit lacks a clear association with its corresponding CVE, and do not consider a scenario that a vulnerability has multiple patches proposed over time before it has been fully resolved. To address these challenges, we introduce LLM-SPL, a recommendation-based SPL approach that leverages the capabilities of the Large Language Model (LLM) to locate the security patch commit for a given CVE. More specifically, we propose a joint learning framework, in which the outputs of LLM serves as additional features to aid our recommendation model in prioritizing security patches. Our evaluation on a dataset of 1,915 CVEs associated with 2,461 patches demonstrates that LLM-SPL excels in ranking patch commits, surpassing the state-of-the-art method in terms of Recall, while significantly reducing manual effort. Notably, for vulnerabilities requiring multiple patches, LLM-SPL significantly improves Recall by 22.83\%, NDCG by 19.41\%, and reduces manual effort by over 25\% when checking up to the top 10 rankings. The dataset and source code are available at \url{https://anonymous.4open.science/r/LLM-SPL-91F8}.
Paper Structure (22 sections, 4 equations, 17 figures, 11 tables)

This paper contains 22 sections, 4 equations, 17 figures, 11 tables.

Figures (17)

  • Figure 1: Unclear Association: CVE and Commit Example
  • Figure 2: Example of a vulnerability fixed by multiple patches collaboratively.
  • Figure 3: One-iteration feedback recommendation process.
  • Figure 4: Design of joint learning framework.
  • Figure 5: Architecture of LLM-SPL.
  • ...and 12 more figures