Table of Contents
Fetching ...

Green My LLM: Studying the key factors affecting the energy consumption of code assistants

Tristan Coignion, Clément Quinton, Romain Rouvoy

TL;DR

This paper investigates the energy consumption of LLM-based code assistants by simulating developer interactions with GitHub Copilot and analyzing various configuration factors, and reveals that careful adjustments can lead to significant energy savings.

Abstract

In recent years,Large Language Models (LLMs) have significantly improved in generating high-quality code, enabling their integration into developers' Integrated Development Environments (IDEs) as code assistants. These assistants, such as GitHub Copilot, deliver real-time code suggestions and can greatly enhance developers' productivity. However, the environmental impact of these tools, in particular their energy consumption, remains a key concern. This paper investigates the energy consumption of LLM-based code assistants by simulating developer interactions with GitHub Copilot and analyzing various configuration factors. We collected a dataset of development traces from 20 developers and conducted extensive software project development simulations to measure energy usage under different scenarios. Our findings reveal that the energy consumption and performance of code assistants are influenced by various factors, such as the number of concurrent developers, model size, quantization methods, and the use of streaming. Notably, a substantial portion of generation requests made by GitHub Copilot is either canceled or rejected by developers, indicating a potential area for reducing wasted computations. Based on these findings, we share actionable insights into optimizing configurations for different use cases, demonstrating that careful adjustments can lead to significant energy savings.

Green My LLM: Studying the key factors affecting the energy consumption of code assistants

TL;DR

This paper investigates the energy consumption of LLM-based code assistants by simulating developer interactions with GitHub Copilot and analyzing various configuration factors, and reveals that careful adjustments can lead to significant energy savings.

Abstract

In recent years,Large Language Models (LLMs) have significantly improved in generating high-quality code, enabling their integration into developers' Integrated Development Environments (IDEs) as code assistants. These assistants, such as GitHub Copilot, deliver real-time code suggestions and can greatly enhance developers' productivity. However, the environmental impact of these tools, in particular their energy consumption, remains a key concern. This paper investigates the energy consumption of LLM-based code assistants by simulating developer interactions with GitHub Copilot and analyzing various configuration factors. We collected a dataset of development traces from 20 developers and conducted extensive software project development simulations to measure energy usage under different scenarios. Our findings reveal that the energy consumption and performance of code assistants are influenced by various factors, such as the number of concurrent developers, model size, quantization methods, and the use of streaming. Notably, a substantial portion of generation requests made by GitHub Copilot is either canceled or rejected by developers, indicating a potential area for reducing wasted computations. Based on these findings, we share actionable insights into optimizing configurations for different use cases, demonstrating that careful adjustments can lead to significant energy savings.

Paper Structure

This paper contains 23 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Various statistics on the participants' usage of the code assistant and their time taken to finish the experiment. The first figure represents the ratio of the number of suggestions accepted by the user to the number of suggestions by the user. The second figure represents the ratio of the number of accepted suggestions to the total number of suggestions requested by the code assistant. The third one focuses on the frequency of requests made by the participant, and the last figure represents the time taken by the participant to finish the experiment.
  • Figure 2: Energy impact ratio from switching from one option to another. A ratio of 1 means no change, a ratio of 2 means the energy consumption doubled, and so on. Points correspond to the ratio in energy when comparing neighboring configurations.
  • Figure 3: Latency impact ratio from switching from one option to another. A ratio of 1 means no change, a ratio of 2 means the latency doubled, and so on. Points correspond to the ratio in latency when comparing neighboring configurations.
  • Figure 4: Evolution of the average power consumption and the latency depending on the number of developers, for the following configuration: StarCoder2-7B; no quantization; no streaming; manual trigger; maximum 1000 concurrent requests; 4 GPUs. The latency and power consumption are superposed in order to easily visualize how they interact when the number of developer increases.
  • Figure 5: Percentage of requests sent by GitHub Copilot depending on their completion state. Red-tinted categories represent requests that did not benefit the user. Blue-tinted categories represent requests that benefited the user.
  • ...and 1 more figures