Table of Contents
Fetching ...

Investigating Execution-Aware Language Models for Code Optimization

Federico Di Menna, Luca Traini, Gabriele Bavota, Vittorio Cortellessa

TL;DR

The paper investigates whether incorporating run-time code execution information into CodeT5+-based models improves automated code optimization. It designs twelve execution-aware variants across four execution aspects (line executions, line coverage, branch coverage, variable states) and three training strategies (S1-S3), evaluated on the PIE dataset with CodeNet traces. Across metrics like correctness, speedup, and %Opt, execution-aware models generally underperform the baseline and sometimes reduce semantic correctness, indicating limited practical benefit for code optimization under these settings. The findings suggest focusing future work on different execution aspects, larger models, or alternative training strategies to meaningfully enhance code optimization performance.

Abstract

Code optimization is the process of enhancing code efficiency, while preserving its intended functionality. This process often requires a deep understanding of the code execution behavior at run-time to identify and address inefficiencies effectively. Recent studies have shown that language models can play a significant role in automating code optimization. However, these models may have insufficient knowledge of how code execute at run-time. To address this limitation, researchers have developed strategies that integrate code execution information into language models. These strategies have shown promise, enhancing the effectiveness of language models in various software engineering tasks. However, despite the close relationship between code execution behavior and efficiency, the specific impact of these strategies on code optimization remains largely unexplored. This study investigates how incorporating code execution information into language models affects their ability to optimize code. Specifically, we apply three different training strategies to incorporate four code execution aspects -- line executions, line coverage, branch coverage, and variable states -- into CodeT5+, a well-known language model for code. Our results indicate that execution-aware models provide limited benefits compared to the standard CodeT5+ model in optimizing code.

Investigating Execution-Aware Language Models for Code Optimization

TL;DR

The paper investigates whether incorporating run-time code execution information into CodeT5+-based models improves automated code optimization. It designs twelve execution-aware variants across four execution aspects (line executions, line coverage, branch coverage, variable states) and three training strategies (S1-S3), evaluated on the PIE dataset with CodeNet traces. Across metrics like correctness, speedup, and %Opt, execution-aware models generally underperform the baseline and sometimes reduce semantic correctness, indicating limited practical benefit for code optimization under these settings. The findings suggest focusing future work on different execution aspects, larger models, or alternative training strategies to meaningfully enhance code optimization performance.

Abstract

Code optimization is the process of enhancing code efficiency, while preserving its intended functionality. This process often requires a deep understanding of the code execution behavior at run-time to identify and address inefficiencies effectively. Recent studies have shown that language models can play a significant role in automating code optimization. However, these models may have insufficient knowledge of how code execute at run-time. To address this limitation, researchers have developed strategies that integrate code execution information into language models. These strategies have shown promise, enhancing the effectiveness of language models in various software engineering tasks. However, despite the close relationship between code execution behavior and efficiency, the specific impact of these strategies on code optimization remains largely unexplored. This study investigates how incorporating code execution information into language models affects their ability to optimize code. Specifically, we apply three different training strategies to incorporate four code execution aspects -- line executions, line coverage, branch coverage, and variable states -- into CodeT5+, a well-known language model for code. Our results indicate that execution-aware models provide limited benefits compared to the standard CodeT5+ model in optimizing code.

Paper Structure

This paper contains 38 sections, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Overview of the proposed training strategies for building execution-aware language models.
  • Figure 2: The figure presents the execution trace corresponding to the outlined sample function, namely dummy_sum(). We use the question mark symbol (?) to show that a variable does not yet have an assigned value. Additionally, it also depict the variable states information and the associated quantized values.