Table of Contents
Fetching ...

This Is Taking Too Long -- Investigating Time as a Proxy for Energy Consumption of LLMs

Lars Krupp, Daniel Geißler, Francisco M. Calatrava-Nicolas, Vishal Banwari, Paul Lukowicz, Jakob Karolus

Abstract

The energy consumption of Large Language Models (LLMs) is raising growing concerns due to their adverse effects on environmental stability and resource use. Yet, these energy costs remain largely opaque to users, especially when models are accessed through an API -- a black box in which all information depends on what providers choose to disclose. In this work, we investigate inference time measurements as a proxy to approximate the associated energy costs of API-based LLMs. We ground our approach by comparing our estimations with actual energy measurements from locally hosted equivalents. Our results show that time measurements allow us to infer GPU models for API-based LLMs, grounding our energy cost estimations. Our work aims to create means for understanding the associated energy costs of API-based LLMs, especially for end users.

This Is Taking Too Long -- Investigating Time as a Proxy for Energy Consumption of LLMs

Abstract

The energy consumption of Large Language Models (LLMs) is raising growing concerns due to their adverse effects on environmental stability and resource use. Yet, these energy costs remain largely opaque to users, especially when models are accessed through an API -- a black box in which all information depends on what providers choose to disclose. In this work, we investigate inference time measurements as a proxy to approximate the associated energy costs of API-based LLMs. We ground our approach by comparing our estimations with actual energy measurements from locally hosted equivalents. Our results show that time measurements allow us to infer GPU models for API-based LLMs, grounding our energy cost estimations. Our work aims to create means for understanding the associated energy costs of API-based LLMs, especially for end users.
Paper Structure (16 sections, 1 equation, 1 figure, 4 tables)

This paper contains 16 sections, 1 equation, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Boxplot showing the computation time per token $\bar{T}_{token}$ in seconds for running the same benchmark on both models across different local GPUs and API executions. Cluster A includes A100 GPUs, while Cluster H groups H100 and H200 GPUs.