Generating refactored code accurately using reinforcement learning

Indranil Palit; Tushar Sharma

Generating refactored code accurately using reinforcement learning

Indranil Palit, Tushar Sharma

TL;DR

The paper tackles the challenge of automatic extract-method refactoring by proposing a reinforcement learning workflow that fine-tunes code-language models with Proximal Policy Optimization to generate compilable, semantically correct refactorings. It builds a large Java dataset from SEART and RefactoringMiner, partitioning into supervised-fine-tuning and RL subsets, and evaluates across both quantitative metrics (BLEU, ROUGE, CodeBLEU) and qualitative unit-test validity. Results show that while PLBART excels on standard metrics, and Code-T5 shines in qualitative correctness, the strongest overall performance arises from a hybrid approach that combines supervised fine-tuning with RL alignment, significantly increasing unit-test success and RefactoringMiner detections. The work demonstrates a viable path toward reliable, automated code refactoring tools and provides replication data, with potential applicability to multiple languages and refactoring types.

Abstract

Automated source code refactoring, particularly extract method refactoring, is a crucial and frequently employed technique during software development. Despite its importance and frequent use by practitioners, current automated techniques face significant limitations. These approaches often rely on developers to identify the precise bounds of refactoring opportunities in terms of source code statements. Also, they often do not capture the semantic context, resulting in offering no automated means to suggest meaningful method name, for instance. To address these challenges, we propose a novel reinforcement learning-based approach for fine-tuning and aligning code language models to perform automated, intelligent extract method refactoring on Java source code. Our approach fine-tunes sequence-to-sequence generative models and aligns them using the Proximal Policy Optimization (PPO) algorithm. We utilize code compilation and presence of the refactoring in the generated code as reward signals, providing a code-centric optimization process. Our experiments demonstrate that our approach significantly enhances the performance of large language models in code refactoring, as evidenced by both quantitative evaluation metrics such as BLEU, ROUGE, and CodeBLEU, and qualitative measures including syntactical and functional correctness. The supervised fine-tuned model, further aligned with PPO, surpasses traditional supervised fine-tuning by 11.96% and 16.45% in terms of BLEU and CodeBLEU scores, respectively. When subjected to a suite of 122 unit tests, the number of successful tests increased from 41 to 66 for the reinforcement learning aligned fine-tuned Code-T5 model, highlighting the effectiveness of our approach in producing functionally correct refactorings.

Generating refactored code accurately using reinforcement learning

TL;DR

Abstract

Generating refactored code accurately using reinforcement learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)