Table of Contents
Fetching ...

Toward Green Code: Prompting Small Language Models for Energy-Efficient Code Generation

Humza Ashraf, Syed Muhammad Danish, Shadikur Rahman, Zeeshan Sattar

TL;DR

This work investigates whether prompt engineering can steer Small Language Models (SLMs) toward energy-efficient code generation. Using four open-source SLMs and 150 LeetCode Python problems, the authors compare four prompting strategies (Role, Zero-Shot, Few-Shot, CoT) and measure runtime, memory, and energy against a human-written baseline, revealing that CoT prompting yields consistent energy savings for Qwen2.5-Coder-3B and StableCode-3B, while CodeLlama-7B and Phi-3-Mini-4K fail to beat the baseline. The results demonstrate that prompting benefits are highly model-dependent and that careful pairing of prompting strategy with the specific SLM is necessary to achieve greener software generation. These findings highlight the potential and limitations of prompt engineering as a practical tool for reducing the environmental footprint of AI-assisted coding in real-world settings. The study emphasizes the importance of model-prompt selection and provides quantitative benchmarks (e.g., baseline energy around $1.7122$ mWh) to guide future work on sustainable code generation.

Abstract

There is a growing concern about the environmental impact of large language models (LLMs) in software development, particularly due to their high energy use and carbon footprint. Small Language Models (SLMs) offer a more sustainable alternative, requiring fewer computational resources while remaining effective for fundamental programming tasks. In this study, we investigate whether prompt engineering can improve the energy efficiency of SLMs in code generation. We evaluate four open-source SLMs, StableCode-Instruct-3B, Qwen2.5-Coder-3B-Instruct, CodeLlama-7B-Instruct, and Phi-3-Mini-4K-Instruct, across 150 Python problems from LeetCode, evenly distributed into easy, medium, and hard categories. Each model is tested under four prompting strategies: role prompting, zero-shot, few-shot, and chain-of-thought (CoT). For every generated solution, we measure runtime, memory usage, and energy consumption, comparing the results with a human-written baseline. Our findings show that CoT prompting provides consistent energy savings for Qwen2.5-Coder and StableCode-3B, while CodeLlama-7B and Phi-3-Mini-4K fail to outperform the baseline under any prompting strategy. These results highlight that the benefits of prompting are model-dependent and that carefully designed prompts can guide SLMs toward greener software development.

Toward Green Code: Prompting Small Language Models for Energy-Efficient Code Generation

TL;DR

This work investigates whether prompt engineering can steer Small Language Models (SLMs) toward energy-efficient code generation. Using four open-source SLMs and 150 LeetCode Python problems, the authors compare four prompting strategies (Role, Zero-Shot, Few-Shot, CoT) and measure runtime, memory, and energy against a human-written baseline, revealing that CoT prompting yields consistent energy savings for Qwen2.5-Coder-3B and StableCode-3B, while CodeLlama-7B and Phi-3-Mini-4K fail to beat the baseline. The results demonstrate that prompting benefits are highly model-dependent and that careful pairing of prompting strategy with the specific SLM is necessary to achieve greener software generation. These findings highlight the potential and limitations of prompt engineering as a practical tool for reducing the environmental footprint of AI-assisted coding in real-world settings. The study emphasizes the importance of model-prompt selection and provides quantitative benchmarks (e.g., baseline energy around mWh) to guide future work on sustainable code generation.

Abstract

There is a growing concern about the environmental impact of large language models (LLMs) in software development, particularly due to their high energy use and carbon footprint. Small Language Models (SLMs) offer a more sustainable alternative, requiring fewer computational resources while remaining effective for fundamental programming tasks. In this study, we investigate whether prompt engineering can improve the energy efficiency of SLMs in code generation. We evaluate four open-source SLMs, StableCode-Instruct-3B, Qwen2.5-Coder-3B-Instruct, CodeLlama-7B-Instruct, and Phi-3-Mini-4K-Instruct, across 150 Python problems from LeetCode, evenly distributed into easy, medium, and hard categories. Each model is tested under four prompting strategies: role prompting, zero-shot, few-shot, and chain-of-thought (CoT). For every generated solution, we measure runtime, memory usage, and energy consumption, comparing the results with a human-written baseline. Our findings show that CoT prompting provides consistent energy savings for Qwen2.5-Coder and StableCode-3B, while CodeLlama-7B and Phi-3-Mini-4K fail to outperform the baseline under any prompting strategy. These results highlight that the benefits of prompting are model-dependent and that carefully designed prompts can guide SLMs toward greener software development.

Paper Structure

This paper contains 24 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overall Methodology
  • Figure 2: Energy consumption of four SLMs under different prompting strategies compared to the baseline (1.7122 mWh, red dashed line).
  • Figure 3: Minimum energy consumption observed for each model under its most efficient prompting strategy, shown in comparison to the baseline (1.7122 mWh)