Table of Contents
Fetching ...

Optimizing Large Language Models: Metrics, Energy Efficiency, and Case Study Insights

Tahniat Khan, Soroor Motie, Sedef Akinli Kocak, Shaina Raza

TL;DR

This paper tackles the sustainability challenge of large language models by quantifying energy consumption and carbon emissions across training and deployment, and proposes a practical optimization framework grounded in Green AI. It combines targeted 4-bit quantization and local on-device inference to significantly reduce energy use, demonstrated through a case study on sustainable LLM deployment with a focus on edge-based operation. The framework evaluates emissions metrics, energy use, and accuracy, showing reductions in carbon footprint up to 45% per inference while maintaining acceptable performance. The work offers actionable guidance for deploying energy-efficient LLMs in resource-constrained environments and highlights pathways for further improvements in adaptive optimization and robustness.

Abstract

The rapid adoption of large language models (LLMs) has led to significant energy consumption and carbon emissions, posing a critical challenge to the sustainability of generative AI technologies. This paper explores the integration of energy-efficient optimization techniques in the deployment of LLMs to address these environmental concerns. We present a case study and framework that demonstrate how strategic quantization and local inference techniques can substantially lower the carbon footprints of LLMs without compromising their operational effectiveness. Experimental results reveal that these methods can reduce energy consumption and carbon emissions by up to 45\% post quantization, making them particularly suitable for resource-constrained environments. The findings provide actionable insights for achieving sustainability in AI while maintaining high levels of accuracy and responsiveness.

Optimizing Large Language Models: Metrics, Energy Efficiency, and Case Study Insights

TL;DR

This paper tackles the sustainability challenge of large language models by quantifying energy consumption and carbon emissions across training and deployment, and proposes a practical optimization framework grounded in Green AI. It combines targeted 4-bit quantization and local on-device inference to significantly reduce energy use, demonstrated through a case study on sustainable LLM deployment with a focus on edge-based operation. The framework evaluates emissions metrics, energy use, and accuracy, showing reductions in carbon footprint up to 45% per inference while maintaining acceptable performance. The work offers actionable guidance for deploying energy-efficient LLMs in resource-constrained environments and highlights pathways for further improvements in adaptive optimization and robustness.

Abstract

The rapid adoption of large language models (LLMs) has led to significant energy consumption and carbon emissions, posing a critical challenge to the sustainability of generative AI technologies. This paper explores the integration of energy-efficient optimization techniques in the deployment of LLMs to address these environmental concerns. We present a case study and framework that demonstrate how strategic quantization and local inference techniques can substantially lower the carbon footprints of LLMs without compromising their operational effectiveness. Experimental results reveal that these methods can reduce energy consumption and carbon emissions by up to 45\% post quantization, making them particularly suitable for resource-constrained environments. The findings provide actionable insights for achieving sustainability in AI while maintaining high levels of accuracy and responsiveness.

Paper Structure

This paper contains 25 sections, 4 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Detailed Overview of the Proposed Optimization Framework
  • Figure 2: Sentiment Assessment Instructions and Indicators Checklist.
  • Figure 3: Key Examples of Sentiment Analysis Experiments