Table of Contents
Fetching ...

A Controlled Experiment on the Energy Efficiency of the Source Code Generated by Code Llama

Vlad-Andrei Cursaru, Laura Duits, Joel Milligan, Damla Ural, Berta Rodriguez Sanchez, Vincenzo Stoico, Ivano Malavolta

TL;DR

This paper evaluates the energy efficiency of source code generated by Code Llama against human-written implementations across three problems and three programming languages, using a full factorial MF-MT design. Code is generated on a PC while energy is measured on a Raspberry Pi with a Monsoon Power Monitor, enabling comparison across language, problem type, prompts, and temperature. Results show that energy efficiency is highly dependent on language and problem, with Code Llama-produced JavaScript often more energy-efficient than human code, while overall generated code can be more energy-intensive; prompting for energy efficiency generally has limited impact. The study highlights the need for energy-aware training and evaluation of LLMs in software development contexts and provides a replication package to support future, broader investigations.

Abstract

Context. Nowadays, 83% of software developers use Large Language Models (LLMs) to generate code. LLMs recently became essential to increase the productivity of software developers and decrease the time and cost of software development. Developers ranging from novices to experts use LLM tools not only to detect and patch bugs, but also to integrate generated code into their software. However, as of today there is no objective assessment of the energy efficiency of the source code generated by LLM tools. Released in August 2023, Code Llama is one of the most recent LLM tools. Goal. In this paper, we present an empirical study that assesses the energy efficiency of Code Llama with respect to human-written source code. Method. We design an experiment involving three human-written benchmarks implemented in C++, JavaScript, and Python. We ask Code Llama to generate the code of the benchmarks using different prompts and temperatures. Therefore, we execute both implementations and profile their energy efficiency. Results. Our study shows that the energy efficiency of code generated by Code Llama is heavily-dependent on the chosen programming language and the specific code problem at hand. Also, human implementations tend to be more energy efficient overall, with generated JavaScript code outperforming its human counterpart. Moreover, explicitly asking Code Llama to generate energy-efficient code results in an equal or worse energy efficiency, as well as using different temperatures seems not to affect the energy efficiency of generated code. Conclusions. According to our results, code generated using Code Llama does not guarantee energy efficiency, even when prompted to do so. Therefore, software developers should evaluate the energy efficiency of generated code before integrating it into the software system under development.

A Controlled Experiment on the Energy Efficiency of the Source Code Generated by Code Llama

TL;DR

This paper evaluates the energy efficiency of source code generated by Code Llama against human-written implementations across three problems and three programming languages, using a full factorial MF-MT design. Code is generated on a PC while energy is measured on a Raspberry Pi with a Monsoon Power Monitor, enabling comparison across language, problem type, prompts, and temperature. Results show that energy efficiency is highly dependent on language and problem, with Code Llama-produced JavaScript often more energy-efficient than human code, while overall generated code can be more energy-intensive; prompting for energy efficiency generally has limited impact. The study highlights the need for energy-aware training and evaluation of LLMs in software development contexts and provides a replication package to support future, broader investigations.

Abstract

Context. Nowadays, 83% of software developers use Large Language Models (LLMs) to generate code. LLMs recently became essential to increase the productivity of software developers and decrease the time and cost of software development. Developers ranging from novices to experts use LLM tools not only to detect and patch bugs, but also to integrate generated code into their software. However, as of today there is no objective assessment of the energy efficiency of the source code generated by LLM tools. Released in August 2023, Code Llama is one of the most recent LLM tools. Goal. In this paper, we present an empirical study that assesses the energy efficiency of Code Llama with respect to human-written source code. Method. We design an experiment involving three human-written benchmarks implemented in C++, JavaScript, and Python. We ask Code Llama to generate the code of the benchmarks using different prompts and temperatures. Therefore, we execute both implementations and profile their energy efficiency. Results. Our study shows that the energy efficiency of code generated by Code Llama is heavily-dependent on the chosen programming language and the specific code problem at hand. Also, human implementations tend to be more energy efficient overall, with generated JavaScript code outperforming its human counterpart. Moreover, explicitly asking Code Llama to generate energy-efficient code results in an equal or worse energy efficiency, as well as using different temperatures seems not to affect the energy efficiency of generated code. Conclusions. According to our results, code generated using Code Llama does not guarantee energy efficiency, even when prompted to do so. Therefore, software developers should evaluate the energy efficiency of generated code before integrating it into the software system under development.
Paper Structure (30 sections, 7 equations, 4 figures, 9 tables)

This paper contains 30 sections, 7 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Experimental Setting
  • Figure 2: Average energy consumption of human and generated code (in log scale).
  • Figure 3: Average energy consumption of code generated using basic and energy-efficient prompts (in log scale).
  • Figure 4: Average energy consumption of code generated using different temperature values (in log scale).