This Is Taking Too Long -- Investigating Time as a Proxy for Energy Consumption of LLMs

Lars Krupp; Daniel Geißler; Francisco M. Calatrava-Nicolas; Vishal Banwari; Paul Lukowicz; Jakob Karolus

This Is Taking Too Long -- Investigating Time as a Proxy for Energy Consumption of LLMs

Lars Krupp, Daniel Geißler, Francisco M. Calatrava-Nicolas, Vishal Banwari, Paul Lukowicz, Jakob Karolus

Abstract

The energy consumption of Large Language Models (LLMs) is raising growing concerns due to their adverse effects on environmental stability and resource use. Yet, these energy costs remain largely opaque to users, especially when models are accessed through an API -- a black box in which all information depends on what providers choose to disclose. In this work, we investigate inference time measurements as a proxy to approximate the associated energy costs of API-based LLMs. We ground our approach by comparing our estimations with actual energy measurements from locally hosted equivalents. Our results show that time measurements allow us to infer GPU models for API-based LLMs, grounding our energy cost estimations. Our work aims to create means for understanding the associated energy costs of API-based LLMs, especially for end users.

This Is Taking Too Long -- Investigating Time as a Proxy for Energy Consumption of LLMs

Abstract

Paper Structure (16 sections, 1 equation, 1 figure, 4 tables)

This paper contains 16 sections, 1 equation, 1 figure, 4 tables.

Introduction
Related Work
Methodology
Experiment Protocol
Local Hardware Architecture Selection
Local LLM Initialization
Local Energy Tracking
API-Based Energy Estimation
Results
Benchmarking Computation Time
Translating Time to Energy
Estimating the Energy Consumption for API-Based LLMs
Discussion
Limitations
Conclusion
...and 1 more sections

Figures (1)

Figure 1: Boxplot showing the computation time per token $\bar{T}_{token}$ in seconds for running the same benchmark on both models across different local GPUs and API executions. Cluster A includes A100 GPUs, while Cluster H groups H100 and H200 GPUs.

This Is Taking Too Long -- Investigating Time as a Proxy for Energy Consumption of LLMs

Abstract

This Is Taking Too Long -- Investigating Time as a Proxy for Energy Consumption of LLMs

Authors

Abstract

Table of Contents

Figures (1)