Table of Contents
Fetching ...

Good for the Planet, Bad for Me? Intended and Unintended Consequences of AI Energy Consumption Disclosure

Michael Klesel, Uwe Messer

Abstract

To address the high energy consumption of artificial intelligence, energy consumption disclosure (ECD) has been proposed to steer users toward more sustainable practices, such as choosing efficient small language models (SLMs) over large language models (LLMs). This presents a performance-sustainability trade-off for users. In an experiment with 365 participants, we explore the impact of ECD and the perceptual and behavioral consequences of choosing an SLM over an LLM. Our findings reveal that ECD is a highly effective measure to nudge individuals toward a pro-environmental choice, increasing the odds of choosing an energy efficient SLM over an LLM by more than 12. Interestingly, this choice did not significantly impact subsequent behavior, as individuals who selected an SLM and those who selected an LLM demonstrated similar prompt behavior. Nevertheless, the choice created a perceptual bias. A placebo effect emerged, with individuals who selected the "eco-friendly" SLM reporting significantly lower satisfaction and perceived quality. These results highlight the double-edged nature of ECD, which holds critical implications for the design of sustainable human-computer interactions.

Good for the Planet, Bad for Me? Intended and Unintended Consequences of AI Energy Consumption Disclosure

Abstract

To address the high energy consumption of artificial intelligence, energy consumption disclosure (ECD) has been proposed to steer users toward more sustainable practices, such as choosing efficient small language models (SLMs) over large language models (LLMs). This presents a performance-sustainability trade-off for users. In an experiment with 365 participants, we explore the impact of ECD and the perceptual and behavioral consequences of choosing an SLM over an LLM. Our findings reveal that ECD is a highly effective measure to nudge individuals toward a pro-environmental choice, increasing the odds of choosing an energy efficient SLM over an LLM by more than 12. Interestingly, this choice did not significantly impact subsequent behavior, as individuals who selected an SLM and those who selected an LLM demonstrated similar prompt behavior. Nevertheless, the choice created a perceptual bias. A placebo effect emerged, with individuals who selected the "eco-friendly" SLM reporting significantly lower satisfaction and perceived quality. These results highlight the double-edged nature of ECD, which holds critical implications for the design of sustainable human-computer interactions.
Paper Structure (25 sections, 7 figures, 4 tables)

This paper contains 25 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: This figure presents an analogy between the luminous efficiency of lightbulbs and the trade-offs in language models. On the left, different lightbulbs produce an identical output (e.g., 1000 lumens) but vary significantly in their energy efficiency. On the right, language models (LLMs and SLMs) differ not only in energy efficiency but also in absolute performance (e.g., MMLU scores of 43.9 vs. 25.9). Critically, unlike the standardized lumen, most users lack the expertise to accurately estimate the true performance difference ($\Delta$) between these models. Error bars indicate variance in energy consumption due to differences in hardware and infrastructure used for inference jegham2025hungry.
  • Figure 2: Research Model
  • Figure 3: The experimental manipulation shown to participants. The control condition (left) displayed only performance ratings, while the treatment condition (right) included an energy efficiency score (A--G), creating a choice between performance and sustainability.
  • Figure 4: ECD altered the choices made by participants. Count plot illustrating the number of individuals in the control (n = 192) and treatment (n = 173) groups who selected either the LLM or the SLM. The distribution of choices between the two groups was statistically significant ($\chi^2$(1, 365) = 269.63, p < .0001).
  • Figure 5: Differences in behavior and perception based on model choice. The results are based on N = 173 observations that received the treatment. The figure shows four boxplots comparing key dependent variables for participants grouped by their choice of the LLM (n = 105) versus the SLM (n = 68). The variables displayed are (A) Average Tokens per Prompt, (B) Number of Prompts, (C) Perceived Satisfaction, and (D) Perceived Quality. Pairwise comparisons were conducted using Mann--Whitney U tests with Holm--Bonferroni correction. Significance levels are denoted as: $^*p \leq 0.05$, $^{**}p \leq 0.01$, $^{***}p \leq 0.001$, $^{****}p \leq 0.0001$.
  • ...and 2 more figures