Table of Contents
Fetching ...

Sustainability Analysis of Prompt Strategies for SLM-based Automated Test Generation

Pragati Kumari, Novarun Deb

Abstract

The growing adoption of prompt-based automation in software testing raises important issues regarding its computational and environmental sustainability. Existing sustainability studies in AI-driven testing primarily focus on large language models, leaving the impact of prompt engineering strategies largely unexplored - particularly in the context of Small Language Models (SLMs). This gap is critical, as prompt design directly influences inference behavior, execution cost, and resource utilization, even when model size is fixed. To the best of our knowledge, this paper presents the first systematic sustainability evaluation of prompt engineering strategies for automated test generation using SLMs. We analyze seven prompt strategies across three open-source SLMs under a controlled experimental setup. Our evaluation jointly considers execution time, token usage, energy consumption, carbon emissions, and coverage test quality, the latter assessed through coverage analysis of the generated test scripts. The results show that prompt strategies have a substantial and independent impact on sustainability outcomes, often outweighing the effect of model choice. Reasoning intensive strategies such as Chain of Thought and Self-Consistency achieve higher coverage but incur significantly higher execution time, energy consumption, and carbon emissions. In contrast, simpler strategies such as Zero-Shot and ReAct deliver competitive coverage test quality with markedly lower environmental cost, while Least-to-Most and Program of Thought offer balanced trade-offs.

Sustainability Analysis of Prompt Strategies for SLM-based Automated Test Generation

Abstract

The growing adoption of prompt-based automation in software testing raises important issues regarding its computational and environmental sustainability. Existing sustainability studies in AI-driven testing primarily focus on large language models, leaving the impact of prompt engineering strategies largely unexplored - particularly in the context of Small Language Models (SLMs). This gap is critical, as prompt design directly influences inference behavior, execution cost, and resource utilization, even when model size is fixed. To the best of our knowledge, this paper presents the first systematic sustainability evaluation of prompt engineering strategies for automated test generation using SLMs. We analyze seven prompt strategies across three open-source SLMs under a controlled experimental setup. Our evaluation jointly considers execution time, token usage, energy consumption, carbon emissions, and coverage test quality, the latter assessed through coverage analysis of the generated test scripts. The results show that prompt strategies have a substantial and independent impact on sustainability outcomes, often outweighing the effect of model choice. Reasoning intensive strategies such as Chain of Thought and Self-Consistency achieve higher coverage but incur significantly higher execution time, energy consumption, and carbon emissions. In contrast, simpler strategies such as Zero-Shot and ReAct deliver competitive coverage test quality with markedly lower environmental cost, while Least-to-Most and Program of Thought offer balanced trade-offs.

Paper Structure

This paper contains 31 sections, 11 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The experiment framework.
  • Figure 2: Execution time, energy consumption, and normalized coverage characteristics observed across different prompt strategies.
  • Figure 3: TokRate (TokenThroughput.)
  • Figure 4: Normalized sustainability metrics per 1K generated tokens: (a) execution time in seconds, (b) carbon emissions, and (c) energy consumption.
  • Figure 5: coverage quality-efficiency metrics across prompt strategies: (a) normalized coverage per 1K tokens, (b) coverage per kWh, and (c) coverage per $CO_2$ emissions.
  • ...and 1 more figures