ACECode: A Reinforcement Learning Framework for Aligning Code Efficiency and Correctness in Code Language Models
Chengran Yang, Hong Jin Kang, Jieke Shi, David Lo
TL;DR
This work targets the problem that CodeLLMs often produce functionally correct but runtime-inefficient code. It introduces ACECode, a reinforcement learning fine-tuning framework that uses a training-free rewarder derived from code execution to jointly optimize code efficiency $G_e$ and correctness $G_c$ via Proximal Policy Optimization. ACECode demonstrates significant gains in code correctness (pass@1) and efficiency (ECC and GET) across four state-of-the-art open-source CodeLLMs, outperforming original, instruction-tuned, and PIE baselines, with improvements of up to $14.51\%$ in pass@1 and up to $10.86\%$ in GET, while reducing runtime in $65\%-72\%$ of cases. The framework removes the need for manually labeled data and execution environments during inference, offering a robust, environment-agnostic approach to multi-objective code generation that could enhance real-world software performance and sustainability.
Abstract
CodeLLMs have demonstrated remarkable advancements in software engineering tasks. However, while these models can generate functionally correct code, they often produce code that is inefficient in terms of runtime. This inefficiency is particularly problematic in resource-constrained environments, impacting software performance and sustainability. Existing approaches for optimizing code efficiency for CodeLLMs like SOAP and PIE exhibit certain limitations. SOAP requires a compatible execution environment and predefined test cases for iterative code modification, while PIE focuses on instruction tuning, improving efficiency but compromising correctness. These shortcomings highlight the need for a fine-tuning framework that optimizes both efficiency and correctness without relying on predefined test cases or specific execution environments. To bridge this gap, we introduce ACECode, a reinforcement learning-based fine-tuning framework that aligns CodeLLMs with dual objectives of efficiency and correctness. ACECode combines three key steps: (1) generating code with an actor CodeLLM, (2) calculating a training-free reward signal derived from code execution feedback for each generated code, and (3) optimizing the CodeLLM via Proximal Policy Optimization (PPO) algorithm. This reward signal enables joint assessment of efficiency and correctness without manual labeling. We evaluate ACECode by fine-tuning four SOTA (state-of-the-art) CodeLLMs and comparing their code with three baselines: original, instruction-tuned, and PIE-tuned CodeLLMs. Extensive experiment results suggest that \tool{} significantly improves the efficiency and correctness of generated code against all baselines for all CodeLLMs. Specifically, CodeLLMs fine-tuned with ACECode improve pass@1 by 1.84% to 14.51% and reduce runtime in 65% to 72% of cases compared to original CodeLLMs.
