PerfRL: A Small Language Model Framework for Efficient Code Optimization
Shukai Duan, Nikos Kanakaris, Xiongye Xiao, Heng Ping, Chenyu Zhou, Nesreen K. Ahmed, Guixiang Ma, Mihai Capota, Theodore L. Willke, Shahin Nazarian, Paul Bogdan
TL;DR
PerfRL tackles efficient code optimization for diverse hardware environments by combining small language models (SLMs) with reinforcement learning (RL) and unit-test feedback. It formalizes the objective as maximizing $cost(x,y_{best}) = eq(R,y_{best}) + perf(y_{best})$ and trains to maximize $P(y_{best}|x;\theta)$ using RRHF-based losses $L_{rank}$ and $L_{tuning}$. Using CodeT5 with 60M parameters on the PIE dataset, PerfRL achieves comparable or superior optimization performance to larger baselines while dramatically reducing training time and energy consumption, thanks to RL-driven fine-tuning and reward-based sample filtering. This approach enables robust, energy-efficient code optimization suitable for edge devices and low-resource settings, advancing practical AI-assisted software optimization.
Abstract
Code optimization is a challenging task requiring a substantial level of expertise from developers. Nonetheless, this level of human capacity is not sufficient considering the rapid evolution of new hardware architectures and software environments. In light of this, recent research proposes adopting machine learning and artificial intelligence techniques to automate the code optimization process. In this paper, we introduce PerfRL, an innovative framework designed to tackle the problem of code optimization. Our framework leverages the capabilities of small language models (SLMs) and reinforcement learning (RL), facilitating a system where SLMs can assimilate feedback from their environment during the fine-tuning phase, notably through unit tests. When benchmarked against existing models, PerfRL demonstrates superior efficiency in terms of speed and computational resource usage, attributed to its reduced need for training steps and its compatibility with SLMs. Furthermore, it substantially diminishes the risk of logical and syntactical errors. To evaluate our framework, we conduct experiments on the PIE dataset using a lightweight large language model (i.e., CodeT5) and a new reinforcement learning algorithm, namely RRHF. For evaluation purposes, we use a list of evaluation metrics related to optimization quality and speedup. The evaluation results show that our approach achieves similar or better results compared to state-of-the-art models using shorter training times and smaller pre-trained models.
