Table of Contents
Fetching ...

The Unseen AI Disruptions for Power Grids: LLM-Induced Transients

Yuzhuo Li, Mariam Mughees, Yize Chen, Yunwei Ryan Li

TL;DR

The scale of AI power consumption is examined, AI transient behaviour in various scenarios is analyzed, high-level mathematical models to depict AI workload behaviour are developed, and the multifaceted challenges and opportunities they potentially bring to existing power grids are discussed.

Abstract

Recent breakthroughs of large language models (LLMs) have exhibited superior capability across major industries and stimulated multi-hundred-billion-dollar investment in AI-centric data centers in the next 3-5 years. This, in turn, bring the increasing concerns on sustainability and AI-related energy usage. However, there is a largely overlooked issue as challenging and critical as AI model and infrastructure efficiency: the disruptive dynamic power consumption behaviour. With fast, transient dynamics, AI infrastructure features ultra-low inertia, sharp power surge and dip, and a significant peak-idle power ratio. The power scale covers from several hundred watts to megawatts, even to gigawatts. These never-seen-before characteristics make AI a very unique load and pose threats to the power grid reliability and resilience. To reveal this hidden problem, this paper examines the scale of AI power consumption, analyzes AI transient behaviour in various scenarios, develops high-level mathematical models to depict AI workload behaviour and discusses the multifaceted challenges and opportunities they potentially bring to existing power grids. Observing the rapidly evolving machine learning (ML) and AI technologies, this work emphasizes the critical need for interdisciplinary approaches to ensure reliable and sustainable AI infrastructure development, and provides a starting point for researchers and practitioners to tackle such challenges.

The Unseen AI Disruptions for Power Grids: LLM-Induced Transients

TL;DR

The scale of AI power consumption is examined, AI transient behaviour in various scenarios is analyzed, high-level mathematical models to depict AI workload behaviour are developed, and the multifaceted challenges and opportunities they potentially bring to existing power grids are discussed.

Abstract

Recent breakthroughs of large language models (LLMs) have exhibited superior capability across major industries and stimulated multi-hundred-billion-dollar investment in AI-centric data centers in the next 3-5 years. This, in turn, bring the increasing concerns on sustainability and AI-related energy usage. However, there is a largely overlooked issue as challenging and critical as AI model and infrastructure efficiency: the disruptive dynamic power consumption behaviour. With fast, transient dynamics, AI infrastructure features ultra-low inertia, sharp power surge and dip, and a significant peak-idle power ratio. The power scale covers from several hundred watts to megawatts, even to gigawatts. These never-seen-before characteristics make AI a very unique load and pose threats to the power grid reliability and resilience. To reveal this hidden problem, this paper examines the scale of AI power consumption, analyzes AI transient behaviour in various scenarios, develops high-level mathematical models to depict AI workload behaviour and discusses the multifaceted challenges and opportunities they potentially bring to existing power grids. Observing the rapidly evolving machine learning (ML) and AI technologies, this work emphasizes the critical need for interdisciplinary approaches to ensure reliable and sustainable AI infrastructure development, and provides a starting point for researchers and practitioners to tackle such challenges.
Paper Structure (70 sections, 19 equations, 18 figures, 6 tables)

This paper contains 70 sections, 19 equations, 18 figures, 6 tables.

Figures (18)

  • Figure 1: Reported energy consumption of training different LLM models with respect to model parameters de2023growingluccioni2023estimatingmlenergy2024leaderboardllama2024herdelmeleegy2024demystifying. Note the consumption shown here is relatively positioned, not based on accurate numerical calculation. The exact energy consumption can differ dramatically given different AI acceleration hardware, training and inference settings. * means estimated energy consumption based on model size.
  • Figure 2: The schematic topology of an AI server with 8 GPUs.
  • Figure 3: The relationship between typical LLM pipelines and associated hardware processes.
  • Figure 4: Different parallelization paradigms for AI/ML tasks jia2019beyond.
  • Figure 5: Details of Data Center Electrical System (an example from Google google2024datacenters).
  • ...and 13 more figures