Table of Contents
Fetching ...

AI-focused HPC Data Centers Can Provide More Power Grid Flexibility and at Lower Cost

Yihong Zhou, Angel Paredes, Chaimaa Essayeh, Thomas Morstyn

TL;DR

This study addresses the challenge of rising AI-driven HPC power demand by evaluating how AI-focused data centers can provide grid-flexibility services at lower cost than traditional CPU-heavy HPC centers. It develops a MILP-based framework to quantify maximum data-center flexibility across services and introduces a linear cost model to estimate the profitability of offering flexibility, with cost-scaling factors informed by cloud-pricing data. Across 14 real-world data centers and multiple services, AI-focused centers show greater long-duration flexibility and substantially lower flexibility costs (about 50% on average) than general-purpose centers, with dynamic quotas further enhancing flexibility at modest energy costs. The work provides scalable algebraic formulas to extrapolate results to other centers, enabling rapid assessment for operators and policymakers, and it offers practical guidance for integrating data centers into grid-flexibility markets.

Abstract

The recent growth of Artificial Intelligence (AI), particularly large language models, requires energy-demanding high-performance computing (HPC) data centers, which poses a significant burden on power system capacity. Scheduling data center computing jobs to manage power demand can alleviate network stress with minimal infrastructure investment and contribute to fast time-scale power system balancing. This study, for the first time, comprehensively analyzes the capability and cost of grid flexibility provision by GPU-heavy AI-focused HPC data centers, along with a comparison with CPU-heavy general-purpose HPC data centers traditionally used for scientific computing. A data center flexibility cost model is proposed that accounts for the value of computing. Using real-world computing traces from 7 AI-focused HPC data centers and 7 general-purpose HPC data centers, along with computing prices from 3 cloud platforms, we find that AI-focused HPC data centers can offer greater flexibility at 50% lower cost compared to general-purpose HPC data centers for a range of power system services. By comparing the cost to flexibility market prices, we illustrate the financial profitability of flexibility provision for AI-focused HPC data centers. Finally, our flexibility and cost estimates can be scaled using parameters of other data centers through algebraic operations, avoiding the need for re-optimization.

AI-focused HPC Data Centers Can Provide More Power Grid Flexibility and at Lower Cost

TL;DR

This study addresses the challenge of rising AI-driven HPC power demand by evaluating how AI-focused data centers can provide grid-flexibility services at lower cost than traditional CPU-heavy HPC centers. It develops a MILP-based framework to quantify maximum data-center flexibility across services and introduces a linear cost model to estimate the profitability of offering flexibility, with cost-scaling factors informed by cloud-pricing data. Across 14 real-world data centers and multiple services, AI-focused centers show greater long-duration flexibility and substantially lower flexibility costs (about 50% on average) than general-purpose centers, with dynamic quotas further enhancing flexibility at modest energy costs. The work provides scalable algebraic formulas to extrapolate results to other centers, enabling rapid assessment for operators and policymakers, and it offers practical guidance for integrating data centers into grid-flexibility markets.

Abstract

The recent growth of Artificial Intelligence (AI), particularly large language models, requires energy-demanding high-performance computing (HPC) data centers, which poses a significant burden on power system capacity. Scheduling data center computing jobs to manage power demand can alleviate network stress with minimal infrastructure investment and contribute to fast time-scale power system balancing. This study, for the first time, comprehensively analyzes the capability and cost of grid flexibility provision by GPU-heavy AI-focused HPC data centers, along with a comparison with CPU-heavy general-purpose HPC data centers traditionally used for scientific computing. A data center flexibility cost model is proposed that accounts for the value of computing. Using real-world computing traces from 7 AI-focused HPC data centers and 7 general-purpose HPC data centers, along with computing prices from 3 cloud platforms, we find that AI-focused HPC data centers can offer greater flexibility at 50% lower cost compared to general-purpose HPC data centers for a range of power system services. By comparing the cost to flexibility market prices, we illustrate the financial profitability of flexibility provision for AI-focused HPC data centers. Finally, our flexibility and cost estimates can be scaled using parameters of other data centers through algebraic operations, avoiding the need for re-optimization.

Paper Structure

This paper contains 33 sections, 33 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: Baseline utilization time series and maximum amount of flexibility for two selected data centers. (a) Utilization of the AI-focused HPC data center (Saturn in Table \ref{['tab:summary']}). (b) Utilization of the general-purpose HPC data center (ORNL in Table \ref{['tab:summary']}). The black lines represent the utilization upper bound, while the green lines show the utilization time series. (c) and (d) are for the maximum amounts of flexibility for power system services with different duration and frequency requirements. (c) Results for the AI-focused HPC data center (Saturn). (d) Results for the general-purpose HPC data center (ORNL). The maximum flexibility amounts are normalized ('Norm. Flex') as ratios of the maximum power of the data center. The heat maps in each column are subject to a specific maximum delay limit (max delay), which is the maximum delay time proportional to the job computing time. "min", "max", and "mean" refer to the minimum, maximum, and mean values across all blocks in each heat map.
  • Figure 2: Average flexibility cost and power system service prices. In (a)-(f), each heatmap displays the average cost of providing different percentages of the maximum amounts of flexibility evaluated in Figs. \ref{['fig:normalised_flex_AI']} and \ref{['fig:normalised_flex_General']}. The maximum delay limit is set to 20%. "min", "max", and "mean" refer to the minimum, maximum, and mean values across all blocks in each heatmap. Plots (a), (c), and (e) correspond to the AI-focused HPC data center (Saturn) under 25-th (P25), 50-th (P50), and 75-th (P75) percentiles of the cost scaling factor, estimated using data from Google Cloud, AWS, and Oracle. Plots (b), (d), and (f) correspond to the general-purpose HPC data center (ORNL). Plot (g) shows percentiles of real-world power system service prices from Germany's aFRR aFRR_German, Australia's NEM AEMO_price_data, and UK's DFS dfs.
  • Figure 3: Samples of the cost scaling factor and the elements ($A$, $R$, and $G$) involved in its computation (see Eq. \ref{['eq:scale_ACoF_nodq']}). These samples are derived based on data of computing rental options on Google Cloud, AWS, and Oracle. The 25th, 50th, and 75th percentiles of the cost scaling factor for general-purpose HPC data centers (General) are 10.60, 24,49, and 51.42 respectively. The 25th, 50th, and 75th percentiles of the cost scaling factor for AI-focused HPC data centers (AI) are 2.03, 11.37, and 40.48 respectively. $A$ is the price reduction coefficient that represents the proportionate price reduction in response to a certain proportion of job delay. $R$ is the hourly price of a single virtual CPU (vCPU, for general-purpose HPCs) or a single GPU (for AI-focused HPCs). $G$ is the power of a single vCPU or a single GPU. These parameters are interpreted in the \ref{['sec:method']} section. On most cloud platforms, CPUs are rented on the basis of vCPUs, which typically represents one thread of a physical CPU. We follow their convention here. The power of a vCPU ($G$) is calculated by dividing the rated power of the physical CPU by the number of vCPUs. Note that, in the plots of $R$, $G$, and $R/G$, when a set of computing rental options are essentially renting different portions of the same type of machine, we only keep one option to avoid over-counting the same machine for its power and price parameter.
  • Figure 4: Average flexibility cost under the cost scaling factor derived by Lambda GPU Cloud data lambda_gpu_cloud. These cost results are for the AI-focused HPC data center (Saturn) when providing different percentages of the maximum amounts of flexibility evaluated in Fig. \ref{['fig:normalised_flex_AI']}. The maximum delay limit is set to 20%. "min", "max", and "mean" refer to the minimum, maximum, and mean values across all blocks in each heatmap. (a), (b), and (c) are for 25-th, 50-th, and 75-th percentiles of the cost scaling factor respectively.
  • Figure 5: Analysis of all the 14 HPC data centers. (a) Baseline utilization time series (green lines) of the 7 AI-focused HPC data centers. (b) Baseline utilization time series (green lines) of the 7 general-purpose HPC data centers. The black lines refer to the utilization upper bound. (c) and (d): The normalized maximum amount of flexibility (Norm. flexibility) versus the average flexibility cost (ave. flex. cost) when providing 100% of the maximum flexibility for two power system services. We use the 50-th percentile (P50) of the cost scaling factor (estimated using data from Google Cloud, AWS, and Oracle). (c) Results for providing primary response (duration of 0.25 hours and frequency of 2920 times/year); (d) Results for providing congestion management (duration of 2 hours and frequency of 365 times/year). Dots closer to the top-left corner indicate better flexibility providers in terms of greater flexibility and lower cost.
  • ...and 6 more figures