Table of Contents
Fetching ...

PerfRL: A Small Language Model Framework for Efficient Code Optimization

Shukai Duan, Nikos Kanakaris, Xiongye Xiao, Heng Ping, Chenyu Zhou, Nesreen K. Ahmed, Guixiang Ma, Mihai Capota, Theodore L. Willke, Shahin Nazarian, Paul Bogdan

TL;DR

PerfRL tackles efficient code optimization for diverse hardware environments by combining small language models (SLMs) with reinforcement learning (RL) and unit-test feedback. It formalizes the objective as maximizing $cost(x,y_{best}) = eq(R,y_{best}) + perf(y_{best})$ and trains to maximize $P(y_{best}|x;\theta)$ using RRHF-based losses $L_{rank}$ and $L_{tuning}$. Using CodeT5 with 60M parameters on the PIE dataset, PerfRL achieves comparable or superior optimization performance to larger baselines while dramatically reducing training time and energy consumption, thanks to RL-driven fine-tuning and reward-based sample filtering. This approach enables robust, energy-efficient code optimization suitable for edge devices and low-resource settings, advancing practical AI-assisted software optimization.

Abstract

Code optimization is a challenging task requiring a substantial level of expertise from developers. Nonetheless, this level of human capacity is not sufficient considering the rapid evolution of new hardware architectures and software environments. In light of this, recent research proposes adopting machine learning and artificial intelligence techniques to automate the code optimization process. In this paper, we introduce PerfRL, an innovative framework designed to tackle the problem of code optimization. Our framework leverages the capabilities of small language models (SLMs) and reinforcement learning (RL), facilitating a system where SLMs can assimilate feedback from their environment during the fine-tuning phase, notably through unit tests. When benchmarked against existing models, PerfRL demonstrates superior efficiency in terms of speed and computational resource usage, attributed to its reduced need for training steps and its compatibility with SLMs. Furthermore, it substantially diminishes the risk of logical and syntactical errors. To evaluate our framework, we conduct experiments on the PIE dataset using a lightweight large language model (i.e., CodeT5) and a new reinforcement learning algorithm, namely RRHF. For evaluation purposes, we use a list of evaluation metrics related to optimization quality and speedup. The evaluation results show that our approach achieves similar or better results compared to state-of-the-art models using shorter training times and smaller pre-trained models.

PerfRL: A Small Language Model Framework for Efficient Code Optimization

TL;DR

PerfRL tackles efficient code optimization for diverse hardware environments by combining small language models (SLMs) with reinforcement learning (RL) and unit-test feedback. It formalizes the objective as maximizing and trains to maximize using RRHF-based losses and . Using CodeT5 with 60M parameters on the PIE dataset, PerfRL achieves comparable or superior optimization performance to larger baselines while dramatically reducing training time and energy consumption, thanks to RL-driven fine-tuning and reward-based sample filtering. This approach enables robust, energy-efficient code optimization suitable for edge devices and low-resource settings, advancing practical AI-assisted software optimization.

Abstract

Code optimization is a challenging task requiring a substantial level of expertise from developers. Nonetheless, this level of human capacity is not sufficient considering the rapid evolution of new hardware architectures and software environments. In light of this, recent research proposes adopting machine learning and artificial intelligence techniques to automate the code optimization process. In this paper, we introduce PerfRL, an innovative framework designed to tackle the problem of code optimization. Our framework leverages the capabilities of small language models (SLMs) and reinforcement learning (RL), facilitating a system where SLMs can assimilate feedback from their environment during the fine-tuning phase, notably through unit tests. When benchmarked against existing models, PerfRL demonstrates superior efficiency in terms of speed and computational resource usage, attributed to its reduced need for training steps and its compatibility with SLMs. Furthermore, it substantially diminishes the risk of logical and syntactical errors. To evaluate our framework, we conduct experiments on the PIE dataset using a lightweight large language model (i.e., CodeT5) and a new reinforcement learning algorithm, namely RRHF. For evaluation purposes, we use a list of evaluation metrics related to optimization quality and speedup. The evaluation results show that our approach achieves similar or better results compared to state-of-the-art models using shorter training times and smaller pre-trained models.
Paper Structure (6 sections, 12 equations, 2 figures, 1 table)

This paper contains 6 sections, 12 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Overview of the PerfRL framework. (a) Training. We first fine-tune an SLM model using the whole training dataset. Then, we pass the input codes into the fine-tuned LM to generate a predefined number of optimized samples for each input program. We assign a score value to each sample and calculate its reward using the reward model. We utilize the score and reward values to calculate the $L_{rank}$ and $L_{tuning}$ loss values for RL. The final loss $L$ is calculated by combining the two previous loss values and is used to retrain the model. In that way, our framework incorporates feedback from unit tests into its training process. Thus, it is more likely to generate optimized code that is free from syntactical and logical errors, mitigating hallucinations. (b) Reward model. Depending on the status of the code (e.g. can be compiled, has a run-time error or passes all the unit tests) a different reward value is given by the reward model. (c) Inference. During inference, the final SLM is utilized to generate multiple samples of candidate source codes. These generated source codes are then evaluated using the reward model. All samples that do not receive an R4 reward are filtered out. The source codes that remain after this elimination process are considered as the optimized code.
  • Figure 2: (left) Pass rate and optimization rate on validation data over RL steps on the fine-tuned CodeT5 model. (right) Compilation, pass, and optimization rate per RL step for the generated programs using the fine-tuned CodeT5 model. All the rates are calculated by the number of compiled, passed, or optimized programs over the total number of the generated programs.