Table of Contents
Fetching ...

ReCode: Updating Code API Knowledge with Reinforcement Learning

Haoze Wu, Yunzhi Yao, Wenhao Yu, Ningyu Zhang

TL;DR

The paper tackles the challenge of LLM code generation degrading when external library APIs update frequently. It introduces ReCode, a rule-based reinforcement fine-tuning framework that integrates prompt-provided API updates with an AST-aware, edit-similarity reward to train code migration. A training dataset of approximately 2K entries is created from real API release notes, and experiments across GRPO and DAPO demonstrate improved code-migration performance on CodeUpdateArena while preserving general coding ability better than supervised fine-tuning. The results show that ReCode can surpass larger baselines on key metrics and generalize to unseen API updates, positioning it as a promising approach for dynamic knowledge in code generation.

Abstract

Large Language Models (LLMs) exhibit remarkable code generation capabilities but falter when adapting to frequent updates in external library APIs. This critical limitation, stemming from reliance on outdated API knowledge from their training data, even with access to current documentation, impedes reliable code generation in dynamic environments. To tackle this issue, we propose ReCode (rule-based Reinforcement learning for Code Update), a novel framework that mimics human programmer adaptation to API changes. Specifically, we construct a dataset of approximately 2,000 data entries to train the LLMs to perform version migration based on updated information. Then, we introduce a modified string similarity metric for code evaluation as the reward for reinforcement learning. Our experiments demonstrate that ReCode substantially boosts LLMs' code generation performance in dynamic API scenarios, especially on the unseen CodeUpdateArena task. Crucially, compared to supervised fine-tuning, ReCode has less impact on LLMs' general code generation abilities. We apply ReCode on various LLMs and reinforcement learning algorithms (GRPO and DAPO), all achieving consistent improvements. Notably, after training, Qwen2.5-Coder-7B outperforms that of the 32B parameter code instruction-tuned model and the reasoning model with the same architecture. Code is available at https://github.com/zjunlp/ReCode.

ReCode: Updating Code API Knowledge with Reinforcement Learning

TL;DR

The paper tackles the challenge of LLM code generation degrading when external library APIs update frequently. It introduces ReCode, a rule-based reinforcement fine-tuning framework that integrates prompt-provided API updates with an AST-aware, edit-similarity reward to train code migration. A training dataset of approximately 2K entries is created from real API release notes, and experiments across GRPO and DAPO demonstrate improved code-migration performance on CodeUpdateArena while preserving general coding ability better than supervised fine-tuning. The results show that ReCode can surpass larger baselines on key metrics and generalize to unseen API updates, positioning it as a promising approach for dynamic knowledge in code generation.

Abstract

Large Language Models (LLMs) exhibit remarkable code generation capabilities but falter when adapting to frequent updates in external library APIs. This critical limitation, stemming from reliance on outdated API knowledge from their training data, even with access to current documentation, impedes reliable code generation in dynamic environments. To tackle this issue, we propose ReCode (rule-based Reinforcement learning for Code Update), a novel framework that mimics human programmer adaptation to API changes. Specifically, we construct a dataset of approximately 2,000 data entries to train the LLMs to perform version migration based on updated information. Then, we introduce a modified string similarity metric for code evaluation as the reward for reinforcement learning. Our experiments demonstrate that ReCode substantially boosts LLMs' code generation performance in dynamic API scenarios, especially on the unseen CodeUpdateArena task. Crucially, compared to supervised fine-tuning, ReCode has less impact on LLMs' general code generation abilities. We apply ReCode on various LLMs and reinforcement learning algorithms (GRPO and DAPO), all achieving consistent improvements. Notably, after training, Qwen2.5-Coder-7B outperforms that of the 32B parameter code instruction-tuned model and the reasoning model with the same architecture. Code is available at https://github.com/zjunlp/ReCode.

Paper Structure

This paper contains 31 sections, 7 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Top: LLMs cannot be aware of API updates that occur after their release date, which may lead to code errors. Bottom: Simply incorporating update information into the prompt cannot effectively alleviate the issue of outdated APIs. ReCode enhances their ability to migrate code to new versions through rule-based RFT.
  • Figure 2: Training Reward and Test Pass Rate during RL Fine-Tuning. It demonstrates that Qwen2.5-Coder-7B-Instruct can enhance performance on the unseen CodeUpdateArena, even surpassing 32B code model and reasoning model after training. The two dashed lines in the figure represent the Pass@1 of the corresponding models on CodeUpdateArena.
  • Figure 3: The pipeline of data collection and training task with a running example.
  • Figure 4: Correctness Reward and Training Pipeline (taking GRPO as an example). The dashed box shows the correctness reward of our design, which includes two parts: syntax checking and string matching. It is worth mentioning that ReCode can be adapted to any reinforcement learning algorithm and is not limited to GRPO.
  • Figure 5: Left: the proportion of data with inter-group reward variance being $0$. Right: The changes in response length during the training process.