Table of Contents
Fetching ...

Automating Zero-Shot Patch Porting for Hard Forks

Shengyi Pan, You Wang, Zhongxin Liu, Xing Hu, Xin Xia, Shanping Li

TL;DR

Hard forks enable code reuse but create maintenance challenges as source and fork diverge, making patch propagation slow and security-sensitive. The authors propose PPatHF, a zero-shot, LLM-based patch porting framework with a reduction module to slim inputs and a porting module that applies patches via prompt-based learning and instruction tuning with LoRA. On 310 Vim→Neovim patches, PPatHF achieves 42.3% exact porting and reduces manual edits by 57%, outperforming baselines and demonstrating practical potential to accelerate patch propagation and reduce vulnerability exposure across fork families. The work also shows generalizability to other fork pairs and highlights the effectiveness of combining semantic patch understanding with structured data and lightweight fine-tuning for cross-project code transformations.

Abstract

Forking is a typical way of code reuse, which provides a simple way for developers to create a variant software (denoted as hard fork) by copying and modifying an existing codebase. Despite of the benefits, forking also leads to duplicate efforts in software maintenance. Developers need to port patches across the hard forks to address similar bugs or implement similar features. Due to the divergence between the source project and the hard fork, patch porting is complicated, which requires an adaption regarding different implementations of the same functionality. In this work, we take the first step to automate patch porting for hard forks under a zero-shot setting. We first conduct an empirical study of the patches ported from Vim to Neovim over the last ten years to investigate the necessities of patch porting and the potential flaws in the current practice. We then propose a large language model (LLM) based approach (namely PPatHF) to automatically port patches for hard forks on a function-wise basis. Specifically, PPatHF is composed of a reduction module and a porting module. Given the pre- and post-patch versions of a function from the reference project and the corresponding function from the target project, the reduction module first slims the input functions by removing code snippets less relevant to the patch. Then, the porting module leverages a LLM to apply the patch to the function from the target project. We evaluate PPatHF on 310 Neovim patches ported from Vim. The experimental results show that PPatHF outperforms the baselines significantly. Specifically, PPatHF can correctly port 131 (42.3%) patches and automate 57% of the manual edits required for the developer to port the patch.

Automating Zero-Shot Patch Porting for Hard Forks

TL;DR

Hard forks enable code reuse but create maintenance challenges as source and fork diverge, making patch propagation slow and security-sensitive. The authors propose PPatHF, a zero-shot, LLM-based patch porting framework with a reduction module to slim inputs and a porting module that applies patches via prompt-based learning and instruction tuning with LoRA. On 310 Vim→Neovim patches, PPatHF achieves 42.3% exact porting and reduces manual edits by 57%, outperforming baselines and demonstrating practical potential to accelerate patch propagation and reduce vulnerability exposure across fork families. The work also shows generalizability to other fork pairs and highlights the effectiveness of combining semantic patch understanding with structured data and lightweight fine-tuning for cross-project code transformations.

Abstract

Forking is a typical way of code reuse, which provides a simple way for developers to create a variant software (denoted as hard fork) by copying and modifying an existing codebase. Despite of the benefits, forking also leads to duplicate efforts in software maintenance. Developers need to port patches across the hard forks to address similar bugs or implement similar features. Due to the divergence between the source project and the hard fork, patch porting is complicated, which requires an adaption regarding different implementations of the same functionality. In this work, we take the first step to automate patch porting for hard forks under a zero-shot setting. We first conduct an empirical study of the patches ported from Vim to Neovim over the last ten years to investigate the necessities of patch porting and the potential flaws in the current practice. We then propose a large language model (LLM) based approach (namely PPatHF) to automatically port patches for hard forks on a function-wise basis. Specifically, PPatHF is composed of a reduction module and a porting module. Given the pre- and post-patch versions of a function from the reference project and the corresponding function from the target project, the reduction module first slims the input functions by removing code snippets less relevant to the patch. Then, the porting module leverages a LLM to apply the patch to the function from the target project. We evaluate PPatHF on 310 Neovim patches ported from Vim. The experimental results show that PPatHF outperforms the baselines significantly. Specifically, PPatHF can correctly port 131 (42.3%) patches and automate 57% of the manual edits required for the developer to port the patch.
Paper Structure (22 sections, 4 equations, 6 figures, 5 tables)

This paper contains 22 sections, 4 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: An example of patch porting in Vim-Neovim.
  • Figure 2: Distribution of ported patches over the years
  • Figure 3: ECDF plot of delta days of patch porting
  • Figure 4: Overview of PPatHF
  • Figure 5: Framework of the Porting Module
  • ...and 1 more figures