Table of Contents
Fetching ...

EfficientEdit: Accelerating Code Editing via Edit-Oriented Speculative Decoding

Peiding Wang, Li Zhang, Fang Liu, Yinghao Zhu, Wang Xu, Lin Shi, Xiaoli Lian, Minxiao Li, Bo Shen, An Fu

TL;DR

EfficientEdit tackles the bottleneck of autoregressive decoding in LLM-based code editing by a two-phase reuse-generate paradigm. It reuses code from the to-be-edited input to locate edit locations and employs an edit-oriented draft model with entropy-aware dynamic verification to generate only the necessary edits efficiently. Across CanItEdit and CodeIF-Bench, EfficientEdit achieves up to 13.09× speedups while maintaining or surpassing greedy-decoding quality, and ablations show that the reuse component provides the largest gains. The approach generalizes across model families and editing tasks, offering a practical boost to developer productivity by accelerating code edits without sacrificing correctness.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in code editing, substantially enhancing software development productivity. However, the inherent complexity of code editing tasks forces existing approaches to rely on LLMs' autoregressive end-to-end generation, where decoding speed plays a critical role in efficiency. While inference acceleration techniques like speculative decoding are applied to improve the decoding efficiency, these methods fail to account for the unique characteristics of code editing tasks where changes are typically localized and existing code segments are reused. To address this limitation, we propose EfficientEdit, a novel method that improves LLM-based code editing efficiency through two key mechanisms based on speculative decoding: (1) effective reuse of original code segments while identifying potential edit locations, and (2) efficient generate edit content via high-quality drafts from edit-oriented draft models and a dynamic verification mechanism that balances quality and acceleration. Experimental results show that EfficientEdit can achieve up to 10.38$\times$ and 13.09$\times$ speedup compared to standard autoregressive decoding in CanItEdit and CodeIF-Bench, respectively, outperforming state-of-the-art inference acceleration approaches by up to 90.6%. The code and data are available at https://github.com/zhu-zhu-ding/EfficientEdit.

EfficientEdit: Accelerating Code Editing via Edit-Oriented Speculative Decoding

TL;DR

EfficientEdit tackles the bottleneck of autoregressive decoding in LLM-based code editing by a two-phase reuse-generate paradigm. It reuses code from the to-be-edited input to locate edit locations and employs an edit-oriented draft model with entropy-aware dynamic verification to generate only the necessary edits efficiently. Across CanItEdit and CodeIF-Bench, EfficientEdit achieves up to 13.09× speedups while maintaining or surpassing greedy-decoding quality, and ablations show that the reuse component provides the largest gains. The approach generalizes across model families and editing tasks, offering a practical boost to developer productivity by accelerating code edits without sacrificing correctness.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in code editing, substantially enhancing software development productivity. However, the inherent complexity of code editing tasks forces existing approaches to rely on LLMs' autoregressive end-to-end generation, where decoding speed plays a critical role in efficiency. While inference acceleration techniques like speculative decoding are applied to improve the decoding efficiency, these methods fail to account for the unique characteristics of code editing tasks where changes are typically localized and existing code segments are reused. To address this limitation, we propose EfficientEdit, a novel method that improves LLM-based code editing efficiency through two key mechanisms based on speculative decoding: (1) effective reuse of original code segments while identifying potential edit locations, and (2) efficient generate edit content via high-quality drafts from edit-oriented draft models and a dynamic verification mechanism that balances quality and acceleration. Experimental results show that EfficientEdit can achieve up to 10.38 and 13.09 speedup compared to standard autoregressive decoding in CanItEdit and CodeIF-Bench, respectively, outperforming state-of-the-art inference acceleration approaches by up to 90.6%. The code and data are available at https://github.com/zhu-zhu-ding/EfficientEdit.

Paper Structure

This paper contains 32 sections, 11 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: A motivating example of code editing. Highlighted contents in "Code After" indicate newly generated or modified code. A significant portion of "Code After" (approximately 70%) is reused from "Code Before".
  • Figure 2: The overview of EfficientEdit.
  • Figure 3: Inference speedup of EfficientEdit with different reuse rates on CanItEdit. The reuse rate is the percentage of tokens in the final edited code that come from the original code.
  • Figure 4: Case study. An example generated by the Qwen2.5-Coder configuration from CanItEdit (task ID 25). The proportion of newly edited lines (patches) is approximately 16% of the final code.
  • Figure 5: EfficientEdit performance (Pass@1 and Speedup) at different base entropy thresholds $k$.