EfficientEdit: Accelerating Code Editing via Edit-Oriented Speculative Decoding
Peiding Wang, Li Zhang, Fang Liu, Yinghao Zhu, Wang Xu, Lin Shi, Xiaoli Lian, Minxiao Li, Bo Shen, An Fu
TL;DR
EfficientEdit tackles the bottleneck of autoregressive decoding in LLM-based code editing by a two-phase reuse-generate paradigm. It reuses code from the to-be-edited input to locate edit locations and employs an edit-oriented draft model with entropy-aware dynamic verification to generate only the necessary edits efficiently. Across CanItEdit and CodeIF-Bench, EfficientEdit achieves up to 13.09× speedups while maintaining or surpassing greedy-decoding quality, and ablations show that the reuse component provides the largest gains. The approach generalizes across model families and editing tasks, offering a practical boost to developer productivity by accelerating code edits without sacrificing correctness.
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in code editing, substantially enhancing software development productivity. However, the inherent complexity of code editing tasks forces existing approaches to rely on LLMs' autoregressive end-to-end generation, where decoding speed plays a critical role in efficiency. While inference acceleration techniques like speculative decoding are applied to improve the decoding efficiency, these methods fail to account for the unique characteristics of code editing tasks where changes are typically localized and existing code segments are reused. To address this limitation, we propose EfficientEdit, a novel method that improves LLM-based code editing efficiency through two key mechanisms based on speculative decoding: (1) effective reuse of original code segments while identifying potential edit locations, and (2) efficient generate edit content via high-quality drafts from edit-oriented draft models and a dynamic verification mechanism that balances quality and acceleration. Experimental results show that EfficientEdit can achieve up to 10.38$\times$ and 13.09$\times$ speedup compared to standard autoregressive decoding in CanItEdit and CodeIF-Bench, respectively, outperforming state-of-the-art inference acceleration approaches by up to 90.6%. The code and data are available at https://github.com/zhu-zhu-ding/EfficientEdit.
