Table of Contents
Fetching ...

Learn to Code Sustainably: An Empirical Study on LLM-based Green Code Generation

Tina Vartziotis, Ippolyti Dellatolas, George Dasoulas, Maximilian Schmidt, Florian Schneider, Tim Hoffmann, Sotirios Kotsopoulos, Michael Keckeisen

TL;DR

An empirical study on green code and an overview of green coding practices, as well as metrics used to quantify the sustainability awareness of AI models, shed light on the current capacity of AI models to contribute to sustainable software development.

Abstract

The increasing use of information technology has led to a significant share of energy consumption and carbon emissions from data centers. These contributions are expected to rise with the growing demand for big data analytics, increasing digitization, and the development of large artificial intelligence (AI) models. The need to address the environmental impact of software development has led to increased interest in green (sustainable) coding and claims that the use of AI models can lead to energy efficiency gains. Here, we provide an empirical study on green code and an overview of green coding practices, as well as metrics used to quantify the sustainability awareness of AI models. In this framework, we evaluate the sustainability of auto-generated code. The auto-generate codes considered in this study are produced by generative commercial AI language models, GitHub Copilot, OpenAI ChatGPT-3, and Amazon CodeWhisperer. Within our methodology, in order to quantify the sustainability awareness of these AI models, we propose a definition of the code's "green capacity", based on certain sustainability metrics. We compare the performance and green capacity of human-generated code and code generated by the three AI language models in response to easy-to-hard problem statements. Our findings shed light on the current capacity of AI models to contribute to sustainable software development.

Learn to Code Sustainably: An Empirical Study on LLM-based Green Code Generation

TL;DR

An empirical study on green code and an overview of green coding practices, as well as metrics used to quantify the sustainability awareness of AI models, shed light on the current capacity of AI models to contribute to sustainable software development.

Abstract

The increasing use of information technology has led to a significant share of energy consumption and carbon emissions from data centers. These contributions are expected to rise with the growing demand for big data analytics, increasing digitization, and the development of large artificial intelligence (AI) models. The need to address the environmental impact of software development has led to increased interest in green (sustainable) coding and claims that the use of AI models can lead to energy efficiency gains. Here, we provide an empirical study on green code and an overview of green coding practices, as well as metrics used to quantify the sustainability awareness of AI models. In this framework, we evaluate the sustainability of auto-generated code. The auto-generate codes considered in this study are produced by generative commercial AI language models, GitHub Copilot, OpenAI ChatGPT-3, and Amazon CodeWhisperer. Within our methodology, in order to quantify the sustainability awareness of these AI models, we propose a definition of the code's "green capacity", based on certain sustainability metrics. We compare the performance and green capacity of human-generated code and code generated by the three AI language models in response to easy-to-hard problem statements. Our findings shed light on the current capacity of AI models to contribute to sustainable software development.
Paper Structure (20 sections, 3 equations, 5 figures, 1 table)

This paper contains 20 sections, 3 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Evaluation steps performed for Green Capacity.
  • Figure 2: Comparison plot of Green Capacity values among the three automated code generation models (Copilot, CodeWhisperer, ChatGPT) throughout six coding problems (3Sum, Cookies, Median, Network, Search, Sort). For each reported $\text{GC}_\text{AI}$ value (depicted with the full line bar), we append on its right the computed $\text{GC}_\text{Human}$, that compares the human submission with the initially generated code for each model. A higher left value implies that the code generation model performed better than the human submission throughout the sustainability metrics. Absence of data denotes a value of 0.
  • Figure 3: Heat Map Displaying Performance Delta (PD) in Energy Across Various Tasks and Tools: This map visually represents the difference between initial and optimized solutions. Each cell indicates the PD's contribution to Green Capacity (GC) values for a specific problem task and AI tool. Positive PD values enhance GC, signifying a beneficial impact on sustainability. In contrast, negative or zero PD values indicate a non-contributory effect on GC.
  • Figure 4: Heat Map Displaying Performance Delta (PD) in Runtime Across Various Tasks and Tools: This map illustrates the variance in energy efficiency across different AI tools and coding problems. It highlights instances where tools like ChatGPT and Copilot either fail to produce efficient code or generate code less efficient than the initial implementation, underscoring the challenges in aligning code optimization with the energy efficiency sustainability metric.
  • Figure 5: Heat Map Displaying Performance Delta (PD) in Memory Across Various Tasks and Tools: This map shows the difference between initial and optimized solutions. Each cell indicates the PD's contribution to GC values for a specific problem task and AI tool. Positive PD values enhance GC, signifying a beneficial impact on sustainability. We observe a high variance of non-positive PD values, showcasing the unawareness of the models to the given metric.