Table of Contents
Fetching ...

SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning

Zexiong Ma, Chao Peng, Pengfei Gao, Xiangxin Meng, Yanzhen Zou, Bing Xie

TL;DR

SoRFT tackles the cost and privacy drawbacks of relying on commercial models for issue resolving by building an open-source, subtask-based training regime. It decomposes issue resolving into file, function, line localization, and code edit generation, then applies rejection-sampled supervised fine-tuning followed by rule-based PPO reinforcement learning with ground-truth rewards. The approach yields state-of-the-art results among open-source LLMs on SWE-Bench Verified and Lite, and also improves general code-task performance, suggesting strong generalization and cost-efficient practicality. Overall, SoRFT demonstrates that carefully designed subtasks and objective-grounded reinforcement signals can markedly enhance open-source models for real-world software maintenance tasks without resorting to proprietary APIs.

Abstract

Mainstream issue-resolving frameworks predominantly rely on commercial models, leading to high costs and privacy concerns. Existing training approaches for issue resolving struggle with poor generalization and fail to fully leverage open-source development resources. We propose Subtask-oriented Reinforced Fine-Tuning (SoRFT), a novel training approach to enhance the issue resolving capability of LLMs. We decomposes issue resolving into structured subtasks: file localization, function localization, line localization, and code edit generation. SoRFT consists of two training stages: (1) rejection-sampled supervised fine-tuning, Chain of Thought (CoT) data is filtered using ground-truth before fine-tuning the LLM, and (2) rule-based reinforcement learning, which leverages PPO with ground-truth based rewards. We evaluate the SoRFT-trained model on SWE-Bench Verified and SWE-Bench Lite, achieving state-of-the-art (SOTA) performance among open-source models (e.g., resolve 21.4% issues on SWE-Bench Verified with SoRFT-Qwen-7B). The experimental results demonstrate that SoRFT significantly enhances issue-resolving performance, improves model generalization, and provides a cost-efficient alternative to commercial models.

SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning

TL;DR

SoRFT tackles the cost and privacy drawbacks of relying on commercial models for issue resolving by building an open-source, subtask-based training regime. It decomposes issue resolving into file, function, line localization, and code edit generation, then applies rejection-sampled supervised fine-tuning followed by rule-based PPO reinforcement learning with ground-truth rewards. The approach yields state-of-the-art results among open-source LLMs on SWE-Bench Verified and Lite, and also improves general code-task performance, suggesting strong generalization and cost-efficient practicality. Overall, SoRFT demonstrates that carefully designed subtasks and objective-grounded reinforcement signals can markedly enhance open-source models for real-world software maintenance tasks without resorting to proprietary APIs.

Abstract

Mainstream issue-resolving frameworks predominantly rely on commercial models, leading to high costs and privacy concerns. Existing training approaches for issue resolving struggle with poor generalization and fail to fully leverage open-source development resources. We propose Subtask-oriented Reinforced Fine-Tuning (SoRFT), a novel training approach to enhance the issue resolving capability of LLMs. We decomposes issue resolving into structured subtasks: file localization, function localization, line localization, and code edit generation. SoRFT consists of two training stages: (1) rejection-sampled supervised fine-tuning, Chain of Thought (CoT) data is filtered using ground-truth before fine-tuning the LLM, and (2) rule-based reinforcement learning, which leverages PPO with ground-truth based rewards. We evaluate the SoRFT-trained model on SWE-Bench Verified and SWE-Bench Lite, achieving state-of-the-art (SOTA) performance among open-source models (e.g., resolve 21.4% issues on SWE-Bench Verified with SoRFT-Qwen-7B). The experimental results demonstrate that SoRFT significantly enhances issue-resolving performance, improves model generalization, and provides a cost-efficient alternative to commercial models.

Paper Structure

This paper contains 43 sections, 7 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Rule-based reward example for file localization subtask. LLM generates CoT data for a given issue, the reward for the sampled CoT is then calculated by the $F_\beta$ score based on the extracted answer and the ground-truth answer.
  • Figure 2: SoRFT consists three parts: (1) decompose issue resolving into four subtasks: file localization, function localization, line localization and code edit generation; (2) fine-tune LLMs with rejection-sampled CoT data to enable it follow the task format and reasoning methods for each subtask; (3) employ rule-based reinforcement learning to further enhance the issue resolving ability of LLMs.
  • Figure 3: Comparison of rule-based reward strategy: hit score v.s. $F_\beta$ score.
  • Figure 4: Reward over PPO training steps.
  • Figure 5: Performance of models trained with different training strategies.