Table of Contents
Fetching ...

Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference

Baolin Li, Yankai Jiang, Vijay Gadepally, Devesh Tiwari

TL;DR

Sprout proposes generation directives as a novel lever to reduce the carbon footprint of generative LLM inference without sacrificing quality. It formulates a system-level linear program to assign probabilistic directive levels, guided by real-time grid carbon intensity and offline quality feedback from an evaluator, while scheduling these evaluations opportunistically to minimize overhead. Empirical evaluation on Llama2-13B across multiple regions shows up to 60% per-request reductions and consistent maintenance of high-quality outputs, approaching Oracle-like performance. The approach offers a scalable path toward sustainable GenAI, with broad implications for deployment, infrastructure planning, and policy around green AI.

Abstract

The rapid advancement of Generative Artificial Intelligence (GenAI) across diverse sectors raises significant environmental concerns, notably the carbon emissions from their cloud and high performance computing (HPC) infrastructure. This paper presents Sprout, an innovative framework designed to address these concerns by reducing the carbon footprint of generative Large Language Model (LLM) inference services. Sprout leverages the innovative concept of "generation directives" to guide the autoregressive generation process, thereby enhancing carbon efficiency. Our proposed method meticulously balances the need for ecological sustainability with the demand for high-quality generation outcomes. Employing a directive optimizer for the strategic assignment of generation directives to user prompts and an original offline quality evaluator, Sprout demonstrates a significant reduction in carbon emissions by over 40% in real-world evaluations using the Llama2 LLM and global electricity grid data. This research marks a critical step toward aligning AI technology with sustainable practices, highlighting the potential for mitigating environmental impacts in the rapidly expanding domain of generative artificial intelligence.

Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference

TL;DR

Sprout proposes generation directives as a novel lever to reduce the carbon footprint of generative LLM inference without sacrificing quality. It formulates a system-level linear program to assign probabilistic directive levels, guided by real-time grid carbon intensity and offline quality feedback from an evaluator, while scheduling these evaluations opportunistically to minimize overhead. Empirical evaluation on Llama2-13B across multiple regions shows up to 60% per-request reductions and consistent maintenance of high-quality outputs, approaching Oracle-like performance. The approach offers a scalable path toward sustainable GenAI, with broad implications for deployment, infrastructure planning, and policy around green AI.

Abstract

The rapid advancement of Generative Artificial Intelligence (GenAI) across diverse sectors raises significant environmental concerns, notably the carbon emissions from their cloud and high performance computing (HPC) infrastructure. This paper presents Sprout, an innovative framework designed to address these concerns by reducing the carbon footprint of generative Large Language Model (LLM) inference services. Sprout leverages the innovative concept of "generation directives" to guide the autoregressive generation process, thereby enhancing carbon efficiency. Our proposed method meticulously balances the need for ecological sustainability with the demand for high-quality generation outcomes. Employing a directive optimizer for the strategic assignment of generation directives to user prompts and an original offline quality evaluator, Sprout demonstrates a significant reduction in carbon emissions by over 40% in real-world evaluations using the Llama2 LLM and global electricity grid data. This research marks a critical step toward aligning AI technology with sustainable practices, highlighting the potential for mitigating environmental impacts in the rapidly expanding domain of generative artificial intelligence.
Paper Structure (17 sections, 5 equations, 16 figures, 2 tables)

This paper contains 17 sections, 5 equations, 16 figures, 2 tables.

Figures (16)

  • Figure 1: The auto-regressive generation process of generative language model inference.
  • Figure 2: Two factors that impact a request's carbon footprint during LLM inference: (a) the number of model parameters and (b) the number of generated tokens.
  • Figure 3: (a) Using generation directives can control the number of generated tokens while providing accurate responses. (b) Hosting larger models (e.g., Llama2 13B) with generation directives is better than hosting smaller models (e.g., Llama2 7B) in terms of both carbon emission and correctness.
  • Figure 4: Applying generation directives across different applications reveals variability in sensitivity to these directives, impacting both carbon emissions and the accuracy of the generated content.
  • Figure 5: System Design Overview of Sprout.
  • ...and 11 more figures

Theorems & Definitions (1)

  • Definition 1