Table of Contents
Fetching ...

GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code Generation

Shashikant Ilager, Lukas Florian Briem, Ivona Brandic

TL;DR

GREEN-CODE tackles the rising energy cost of LLM-based code generation by introducing dynamic early exits guided by a reinforcement learning policy. It couples a fine-tuning approach that enables intermediate-layer decoding with a single LM head (LITE) and an PPO-trained RL agent to decide exit points on the fly, balancing energy, latency, and accuracy. Evaluations on OPT-2.7B and Llama-3.2 3B across JavaCorpus and PY150 show energy and latency reductions of roughly 23–50% with comparable CodeBLEU/RougeL scores, and the solution is demonstrated as a VS Code extension for practical use. This work advances sustainable AI in software engineering by enabling real-time, energy-aware code generation without wholesale accuracy degradation, paving the way for deployment on edge and privacy-sensitive environments.

Abstract

Large Language Models (LLMs) are becoming integral to daily life, showcasing their vast potential across various Natural Language Processing (NLP) tasks. Beyond NLP, LLMs are increasingly used in software development tasks, such as code completion, modification, bug fixing, and code translation. Software engineers widely use tools like GitHub Copilot and Amazon Q, streamlining workflows and automating tasks with high accuracy. While the resource and energy intensity of LLM training is often highlighted, inference can be even more resource-intensive over time, as it's a continuous process with a high number of invocations. Therefore, developing resource-efficient alternatives for LLM inference is crucial for sustainability. This work proposes GREEN-CODE, a framework for energy-aware code generation in LLMs. GREEN-CODE performs dynamic early exit during LLM inference. We train a Reinforcement Learning (RL) agent that learns to balance the trade-offs between accuracy, latency, and energy consumption. Our approach is evaluated on two open-source LLMs, Llama 3.2 3B and OPT 2.7B, using the JavaCorpus and PY150 datasets. Results show that our method reduces the energy consumption between 23-50 % on average for code generation tasks without significantly affecting accuracy.

GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code Generation

TL;DR

GREEN-CODE tackles the rising energy cost of LLM-based code generation by introducing dynamic early exits guided by a reinforcement learning policy. It couples a fine-tuning approach that enables intermediate-layer decoding with a single LM head (LITE) and an PPO-trained RL agent to decide exit points on the fly, balancing energy, latency, and accuracy. Evaluations on OPT-2.7B and Llama-3.2 3B across JavaCorpus and PY150 show energy and latency reductions of roughly 23–50% with comparable CodeBLEU/RougeL scores, and the solution is demonstrated as a VS Code extension for practical use. This work advances sustainable AI in software engineering by enabling real-time, energy-aware code generation without wholesale accuracy degradation, paving the way for deployment on edge and privacy-sensitive environments.

Abstract

Large Language Models (LLMs) are becoming integral to daily life, showcasing their vast potential across various Natural Language Processing (NLP) tasks. Beyond NLP, LLMs are increasingly used in software development tasks, such as code completion, modification, bug fixing, and code translation. Software engineers widely use tools like GitHub Copilot and Amazon Q, streamlining workflows and automating tasks with high accuracy. While the resource and energy intensity of LLM training is often highlighted, inference can be even more resource-intensive over time, as it's a continuous process with a high number of invocations. Therefore, developing resource-efficient alternatives for LLM inference is crucial for sustainability. This work proposes GREEN-CODE, a framework for energy-aware code generation in LLMs. GREEN-CODE performs dynamic early exit during LLM inference. We train a Reinforcement Learning (RL) agent that learns to balance the trade-offs between accuracy, latency, and energy consumption. Our approach is evaluated on two open-source LLMs, Llama 3.2 3B and OPT 2.7B, using the JavaCorpus and PY150 datasets. Results show that our method reduces the energy consumption between 23-50 % on average for code generation tasks without significantly affecting accuracy.
Paper Structure (30 sections, 4 equations, 13 figures, 4 tables)

This paper contains 30 sections, 4 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Performance, energy and latency of OPT-2.7B and Llama3.2-3B on JavaCorpus and PY150 with fixed exiting.
  • Figure 2: A high-level view of the system model.
  • Figure 3: $w_i$ distribution for Llama
  • Figure 4: Aggregated Loss of fine-tuned models.
  • Figure 5: Illustration of the RL environment.
  • ...and 8 more figures