Table of Contents
Fetching ...

Energy-Aware LLMs: A step towards sustainable AI for downstream applications

Nguyen Phuc Tran, Brigitte Jaumard, Oscar Delgado

TL;DR

This work addresses the growing energy footprint of large language models when deployed for downstream tasks in communication networks. It introduces an end-to-end energy-performance pipeline that integrates quantization and pruning, evaluates on real RCA and QA datasets, and uses CodeCarbon measurements on a GAIA Ericsson data-center chassis. A ranking mechanism, $R(model_i) = w \cdot \varphi(model_i) + (1-w) \rho$, combines energy-efficiency $\varphi$ with mean task-performance $\rho$ to select top models for inference, while exploring pruning strategies to further economize resources. Key findings show that 16-bit quantization offers strong energy savings with minimal performance loss, and that a 16-bit model with 2:4 structured pruning can achieve the best energy-performance balance, enhancing sustainable AI deployment for network-focused LLM applications. The results highlight practical implications for reducing data-center energy usage without sacrificing critical downstream capabilities, guiding deployment decisions in resource-constrained environments.

Abstract

Advanced Large Language Models (LLMs) have revolutionized various fields, including communication networks, sparking an innovation wave that has led to new applications and services, and significantly enhanced solution schemes. Despite all these impressive developments, most LLMs typically require huge computational resources, resulting in terribly high energy consumption. Thus, this research study proposes an end-to-end pipeline that investigates the trade-off between energy efficiency and model performance for an LLM during fault ticket analysis in communication networks. It further evaluates the pipeline performance using two real-world datasets for the tasks of root cause analysis and response feedback in a communication network. Our results show that an appropriate combination of quantization and pruning techniques is able to reduce energy consumption while significantly improving model performance.

Energy-Aware LLMs: A step towards sustainable AI for downstream applications

TL;DR

This work addresses the growing energy footprint of large language models when deployed for downstream tasks in communication networks. It introduces an end-to-end energy-performance pipeline that integrates quantization and pruning, evaluates on real RCA and QA datasets, and uses CodeCarbon measurements on a GAIA Ericsson data-center chassis. A ranking mechanism, , combines energy-efficiency with mean task-performance to select top models for inference, while exploring pruning strategies to further economize resources. Key findings show that 16-bit quantization offers strong energy savings with minimal performance loss, and that a 16-bit model with 2:4 structured pruning can achieve the best energy-performance balance, enhancing sustainable AI deployment for network-focused LLM applications. The results highlight practical implications for reducing data-center energy usage without sacrificing critical downstream capabilities, guiding deployment decisions in resource-constrained environments.

Abstract

Advanced Large Language Models (LLMs) have revolutionized various fields, including communication networks, sparking an innovation wave that has led to new applications and services, and significantly enhanced solution schemes. Despite all these impressive developments, most LLMs typically require huge computational resources, resulting in terribly high energy consumption. Thus, this research study proposes an end-to-end pipeline that investigates the trade-off between energy efficiency and model performance for an LLM during fault ticket analysis in communication networks. It further evaluates the pipeline performance using two real-world datasets for the tasks of root cause analysis and response feedback in a communication network. Our results show that an appropriate combination of quantization and pruning techniques is able to reduce energy consumption while significantly improving model performance.

Paper Structure

This paper contains 9 sections, 1 equation, 6 figures.

Figures (6)

  • Figure 1: Distribution of token lengths in datasets.
  • Figure 2: Energy-Performance pipeline evaluation for LLMs.
  • Figure 3: Energy consumption vs. training loss on each epoch.
  • Figure 4: Energy consumption vs. carbon emission vs. model precision in the inference phase.
  • Figure 5: Impact of quantization levels.
  • ...and 1 more figures