Table of Contents
Fetching ...

Automated Code Editing with Search-Generate-Modify

Changshu Liu, Pelin Cetin, Yogesh Patodia, Saikat Chakraborty, Yangruibo Ding, Baishakhi Ray

TL;DR

SarGaM tackles automated code editing by mimicking a developer’s workflow: search for related patches, generate a candidate edit, and refine it with a granular modification step. The method combines three components—search augmentation, generation with off-the-shelf code models, and a Levenshtein Transformer-based edit model—to produce high-quality patches for code edits and bug fixes. Across code-editing benchmarks and automated program repair datasets, SarGaM consistently outperforms generation-only and edit-only baselines, demonstrating the value of integrating retrieval and fine-grained edits. The work provides strong evidence that retrieval-informed generation plus precise edit operations can advance practical software maintenance tools, and it releases a prototype implementation for further exploration.

Abstract

Code editing is essential in evolving software development. Many automated code editing tools have been proposed that leverage both Information Retrieval-based techniques and Machine Learning-based code generation and code editing models. Each technique comes with its own promises and perils, and they are often used together to complement their strengths and compensate for their weaknesses. This paper proposes a hybrid approach to better synthesize code edits by leveraging the power of code search, generation, and modification. Our key observation is that a patch obtained by search and retrieval, even if imperfect, can provide helpful guidance to a code generation model. However, a retrieval-guided patch produced by a code generation model can still be a few tokens off from the intended patch. Such generated patches can be slightly modified to create the intended patches. SARGAM is a novel tool designed to mimic a real developer's code editing behavior. Given an original code version, the developer may search for related patches, generate or write the code, and then modify the generated code to adapt it to the right context. Our evaluation of SARGAM on edit generation shows superior performance with respect to current state-of-the-art techniques. SARGAM also shows great effectiveness on automated program repair tasks.

Automated Code Editing with Search-Generate-Modify

TL;DR

SarGaM tackles automated code editing by mimicking a developer’s workflow: search for related patches, generate a candidate edit, and refine it with a granular modification step. The method combines three components—search augmentation, generation with off-the-shelf code models, and a Levenshtein Transformer-based edit model—to produce high-quality patches for code edits and bug fixes. Across code-editing benchmarks and automated program repair datasets, SarGaM consistently outperforms generation-only and edit-only baselines, demonstrating the value of integrating retrieval and fine-grained edits. The work provides strong evidence that retrieval-informed generation plus precise edit operations can advance practical software maintenance tools, and it releases a prototype implementation for further exploration.

Abstract

Code editing is essential in evolving software development. Many automated code editing tools have been proposed that leverage both Information Retrieval-based techniques and Machine Learning-based code generation and code editing models. Each technique comes with its own promises and perils, and they are often used together to complement their strengths and compensate for their weaknesses. This paper proposes a hybrid approach to better synthesize code edits by leveraging the power of code search, generation, and modification. Our key observation is that a patch obtained by search and retrieval, even if imperfect, can provide helpful guidance to a code generation model. However, a retrieval-guided patch produced by a code generation model can still be a few tokens off from the intended patch. Such generated patches can be slightly modified to create the intended patches. SARGAM is a novel tool designed to mimic a real developer's code editing behavior. Given an original code version, the developer may search for related patches, generate or write the code, and then modify the generated code to adapt it to the right context. Our evaluation of SARGAM on edit generation shows superior performance with respect to current state-of-the-art techniques. SARGAM also shows great effectiveness on automated program repair tasks.
Paper Structure (36 sections, 1 equation, 10 figures, 7 tables, 1 algorithm)

This paper contains 36 sections, 1 equation, 10 figures, 7 tables, 1 algorithm.

Figures (10)

  • Figure 1: Different Types of Transformer-based Generative Models
  • Figure 2: Overview of the SarGaM Pipeline and a Motivating Example of a bug fixing patch taken from Defects4J$_{1.2}$ dataset. Here, inside a for loop, the loop counter initialization and loop condition ( int i=0; i< weights.length ) are buggy and (int i=begin; i< begin+length) is the expected fix. After the Search (Step 1), SarGaM retrieves a similar patch (int i=begin; i< n), the retrieval of begin token benefits Generation (Step 2). The generated patch is close to the ground truth: (int i=begin; i< weights.length), yet not correct. Finally, the Modification model (Step 3) further modifies the generated patch by deleting weights. and inserting begin+.
  • Figure 3: Search-Augmented Input Modalities of SarGaM
  • Figure 4: Example modification steps generated by Levenshtein Transformer corresponding to the motivating example. The encoder takes patch location, context, and optional developer's intent as input and outputs hidden state $H=\{h_1, h_2, \cdots, h_N\}$, where N refers to the length of the input sequence. LevT decoder takes $H$ and patch location, and after some Transformer decoder layers, outputs $(z_1, z_2, \cdots, z_M)$. It is passed to three classifiers (deletion, placeholder, insertion) to perform the edits.
  • Figure 5: Example correct patches generated by SarGaM. Inputs are presented in light brown boxes, and synthesized patches are presented in light green boxes.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Definition 3.1