Table of Contents
Fetching ...

Robust Learning of Diverse Code Edits

Tushar Aggarwal, Swayam Singh, Abhijeet Awasthi, Aditya Kanade, Nagarajan Natarajan

TL;DR

This work tackles the limited code-editing capabilities of current language models by combining a diverse synthetic data pipeline with a robust fine-tuning method. It introduces SeleKT, a selective knowledge transfer algorithm that performs dense gradient updates followed by a sparse projection under the $L_0$ constraint to minimize catastrophic forgetting while improving code-editing performance. The authors construct NextCoder models derived from QwenCoder-2.5 and DeepSeekCoder, demonstrating strong code-editing results across five benchmarks, often outperforming larger models and maintaining general abilities like code generation and problem solving. Open-source release of the models, synthetic data, and implementation facilitates broader adoption and future advancement in robust code-editing AI systems.

Abstract

Software engineering activities frequently involve edits to existing code. However, contemporary code language models (LMs) lack the ability to handle diverse types of code-edit requirements. In this work, we attempt to overcome this shortcoming through (1) a novel synthetic data generation pipeline and (2) a robust model adaptation algorithm. Starting with seed code examples and diverse editing criteria, our pipeline generates high-quality samples comprising original and modified code, along with natural language instructions in different styles and verbosity. Today's code LMs come bundled with strong abilities, such as code generation and instruction following, which should not be lost due to fine-tuning. To ensure this, we propose a novel adaptation algorithm, SeleKT, that (a) leverages a dense gradient-based step to identify the weights that are most important for code editing, and (b) does a sparse projection onto the base model to avoid overfitting. Using our approach, we obtain a new series of models NextCoder (adapted from QwenCoder-2.5) that achieves strong results on five code-editing benchmarks, outperforming comparable size models and even several larger ones. We show the generality of our approach on two model families (DeepSeekCoder and QwenCoder), compare against other fine-tuning approaches, and demonstrate robustness by showing retention of code generation and general problem-solving abilities post adaptation. We opensource the models, synthetic dataset, and implementation at https://aka.ms/nextcoder.

Robust Learning of Diverse Code Edits

TL;DR

This work tackles the limited code-editing capabilities of current language models by combining a diverse synthetic data pipeline with a robust fine-tuning method. It introduces SeleKT, a selective knowledge transfer algorithm that performs dense gradient updates followed by a sparse projection under the constraint to minimize catastrophic forgetting while improving code-editing performance. The authors construct NextCoder models derived from QwenCoder-2.5 and DeepSeekCoder, demonstrating strong code-editing results across five benchmarks, often outperforming larger models and maintaining general abilities like code generation and problem solving. Open-source release of the models, synthetic data, and implementation facilitates broader adoption and future advancement in robust code-editing AI systems.

Abstract

Software engineering activities frequently involve edits to existing code. However, contemporary code language models (LMs) lack the ability to handle diverse types of code-edit requirements. In this work, we attempt to overcome this shortcoming through (1) a novel synthetic data generation pipeline and (2) a robust model adaptation algorithm. Starting with seed code examples and diverse editing criteria, our pipeline generates high-quality samples comprising original and modified code, along with natural language instructions in different styles and verbosity. Today's code LMs come bundled with strong abilities, such as code generation and instruction following, which should not be lost due to fine-tuning. To ensure this, we propose a novel adaptation algorithm, SeleKT, that (a) leverages a dense gradient-based step to identify the weights that are most important for code editing, and (b) does a sparse projection onto the base model to avoid overfitting. Using our approach, we obtain a new series of models NextCoder (adapted from QwenCoder-2.5) that achieves strong results on five code-editing benchmarks, outperforming comparable size models and even several larger ones. We show the generality of our approach on two model families (DeepSeekCoder and QwenCoder), compare against other fine-tuning approaches, and demonstrate robustness by showing retention of code generation and general problem-solving abilities post adaptation. We opensource the models, synthetic dataset, and implementation at https://aka.ms/nextcoder.

Paper Structure

This paper contains 47 sections, 1 theorem, 2 equations, 11 figures, 15 tables, 1 algorithm.

Key Result

Lemma 1

For any given a base LM $\theta_{\text{base}}$, and for the setting $\alpha = c/N$, where $N$ is the model size, the fine-tuned model $\theta_{\text{FT}}$ satisfies the constraint in the objective eqn:objectiveselekt.

Figures (11)

  • Figure 1: Performance of state-of-the-art code LMs, in the parameter range 6.7B-16B, on code editing benchmarks. NextCoder-7B is our code-editing model with QwenCoder-2.5-7B as the base, fine-tuned using the proposed SeleKT algorithm on synthetic and real code editing tasks. For NoFunEval, we consider instances with binary oracles to ensure consistency with other benchmarks. We present detailed results in Section \ref{['sec:maintable']}, Table \ref{['main_table']}.
  • Figure 2: Our synthetic data generation pipeline: The input to the pipeline is a seed code snippet, modularities (function, class or file) which defines the scope of the output code and aspects to improve up on (latency, resource utilization, runtime efficiency, maintainability, security, and general improvements along with bug fixing). The output is a synthetic example, approved by the final quality checker, consisting of problem statement, source code, target code, and instructions in different styles and verbosity (detailed, concise, human-like, conversational). The details of the pipeline stages are presented in the running text.
  • Figure 3: Proposed adaptive fine-tuning technique SeleKT.
  • Figure 4: Performance of state-of-the-art code LMs on Aider and Aider Polyglot benchmarks. NextCoder-x is our code-editing model with QwenCoder-2.5-x as the base, fine-tuned using the proposed SeleKT algorithm on synthetic and real code editing tasks. Baseline scores are sourced from the official leaderboard aiderpolyglotCodeEditing.
  • Figure 5: Prompt used for generating a problem and source code conditioned on the given seed code.
  • ...and 6 more figures

Theorems & Definitions (3)

  • Lemma 1
  • Remark 1: Efficiency
  • Remark 2: Alternative Update