Table of Contents
Fetching ...

CoEdPilot: Recommending Code Edits with Learned Prior Edit Relevance, Project-wise Awareness, and Interactive Nature

Chenyan Liu, Yufan Cai, Yun Lin, Yuhuan Huang, Yunrui Pei, Bo Jiang, Ping Yang, Jin Song Dong, Hong Mei

TL;DR

CoEdPilot presents an interactive, transformer-based framework to address practical code edits by modeling the ripple effects and interactions across a software project. It uses a three-component pipeline—subsequent edit analysis, prior edit analysis, and edit generation—arising from an edit event and optional user prompt, and is trained on a large corpus of OSS commits. Empirical results show solid file- and line-level localization, competitive edit-generation quality (BLEU4 around $60.7$; exact-match around $41.8 ext{%}$), and notable improvements when incorporating selective prior edits, with a positive user study in a VS Code integration. The approach is modular, scalable, and publicly available, enabling practitioners to plug in other edit generators and extend project-wise awareness in real editing tasks.

Abstract

Recent years have seen the development of LLM-based code generation. Compared to generating code in a software project, incremental code edits are empirically observed to be more frequent. The emerging code editing approaches usually formulate the problem as generating an edit based on known relevant prior edits and context. However, practical code edits can be more complicated. First, an editing session can include multiple (ir)relevant edits to the code under edit. Second, the inference of the subsequent edits is non-trivial as the scope of its ripple effect can be the whole project. In this work, we propose CoEdPilot, an LLM-driven solution to recommend code edits by discriminating the relevant edits, exploring their interactive natures, and estimating its ripple effect in the project. Specifically, CoEdPilot orchestrates multiple neural transformers to identify what and how to edit in the project regarding both edit location and edit content. When a user accomplishes an edit with an optional editing description, a Subsequent Edit Analysis first reports the most relevant files in the project with what types of edits (e.g., keep, insert, and replace) can happen for each line of their code. Next, an Edit-content Generator generates concrete edit options for the lines of code, regarding its relevant prior changes reported by an Edit-dependency Analyzer. Lastly, both the Subsequent Edit Analysis and the Edit-content Generator capture relevant prior edits as feedback to readjust their recommendations. We train our models by collecting over 180K commits from 471 open-source projects in 5 programming languages. Our extensive experiments show that CoEdPilot can well predict the edits (i.e., predicting edit location with an accuracy of 70.8%-85.3%, and the edit content with an exact match rate of 41.8% and BLEU4 score of 60.7)...

CoEdPilot: Recommending Code Edits with Learned Prior Edit Relevance, Project-wise Awareness, and Interactive Nature

TL;DR

CoEdPilot presents an interactive, transformer-based framework to address practical code edits by modeling the ripple effects and interactions across a software project. It uses a three-component pipeline—subsequent edit analysis, prior edit analysis, and edit generation—arising from an edit event and optional user prompt, and is trained on a large corpus of OSS commits. Empirical results show solid file- and line-level localization, competitive edit-generation quality (BLEU4 around ; exact-match around ), and notable improvements when incorporating selective prior edits, with a positive user study in a VS Code integration. The approach is modular, scalable, and publicly available, enabling practitioners to plug in other edit generators and extend project-wise awareness in real editing tasks.

Abstract

Recent years have seen the development of LLM-based code generation. Compared to generating code in a software project, incremental code edits are empirically observed to be more frequent. The emerging code editing approaches usually formulate the problem as generating an edit based on known relevant prior edits and context. However, practical code edits can be more complicated. First, an editing session can include multiple (ir)relevant edits to the code under edit. Second, the inference of the subsequent edits is non-trivial as the scope of its ripple effect can be the whole project. In this work, we propose CoEdPilot, an LLM-driven solution to recommend code edits by discriminating the relevant edits, exploring their interactive natures, and estimating its ripple effect in the project. Specifically, CoEdPilot orchestrates multiple neural transformers to identify what and how to edit in the project regarding both edit location and edit content. When a user accomplishes an edit with an optional editing description, a Subsequent Edit Analysis first reports the most relevant files in the project with what types of edits (e.g., keep, insert, and replace) can happen for each line of their code. Next, an Edit-content Generator generates concrete edit options for the lines of code, regarding its relevant prior changes reported by an Edit-dependency Analyzer. Lastly, both the Subsequent Edit Analysis and the Edit-content Generator capture relevant prior edits as feedback to readjust their recommendations. We train our models by collecting over 180K commits from 471 open-source projects in 5 programming languages. Our extensive experiments show that CoEdPilot can well predict the edits (i.e., predicting edit location with an accuracy of 70.8%-85.3%, and the edit content with an exact match rate of 41.8% and BLEU4 score of 60.7)...
Paper Structure (29 sections, 5 equations, 7 figures, 9 tables)

This paper contains 29 sections, 5 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: State-of-the-art Code Editing Framework MODITGRACECodeEditor. The dotted rectangles represent the code before and after the recommended edits.
  • Figure 2: The type of edit propagation in the code-editing example showed in \ref{['tab:example11']} and \ref{['tab:example12']}.
  • Figure 3: Overview of CoEdPilot, consisting of subsequent edit analysis, edit generation, and prior edit analysis. The analysis is triggered once an edit-trigger event happens. CoEdPilot orchestrates a set of neural-transformer-based components to accomplish the editing task
  • Figure 4: An example of input of our transformer for learning the dependency.
  • Figure 5: Overview of fine-grained edit location architecture. We formulate the edit location problem as a MLM (Mask Language Modelling) problem to predict the edit type of each LoC (Line of Code).
  • ...and 2 more figures