Table of Contents
Fetching ...

Carbon-Aware Computing for Data Centers with Probabilistic Performance Guarantees

Sophie Hall, Francesco Micheli, Giuseppe Belgioioso, Ana Radovanović, Florian Dörfler

TL;DR

The paper tackles the problem of reducing data-center carbon footprint and peak power by coordinating when and where to execute flexible compute jobs across a fleet. It introduces a two-layer control approach: day-ahead planning via distributionally robust optimization using a $d_W$-based ambiguity set and a $CVaR$-based constraint, followed by real-time placement that tracks the planned schedule. A joint optimization of virtual capacity curves (VCCs) and the scheduling policy $Y$ yields provable probabilistic guarantees and enables exploitation of spatial and temporal job flexibility, including DR signals. Experiments on Google-like load profiles show substantial reductions in carbon cost and peak power compared with myopic greedy policies, with tunable robustness and performance trade-offs. The results highlight practical viability and potential for DR participation and long-term grid planning, thanks to the scalable LP reformulation and receding-horizon extensions.

Abstract

Data centers are significant contributors to carbon emissions and can strain power systems due to their high electricity consumption. To mitigate this impact and to participate in demand response programs, cloud computing companies strive to balance and optimize operations across their global fleets by making strategic decisions about when and where to place compute jobs for execution. In this paper, we introduce a load shaping scheme which reacts to time-varying grid signals by leveraging both temporal and spatial flexibility of compute jobs to provide risk-aware management guidelines and job placement with provable performance guarantees based on distributionally robust optimization. Our approach divides the problem into two key components: (i) day-ahead planning, which generates an optimal scheduling strategy based on historical load data, and (ii) real-time job placement and (time) scheduling, which dynamically tracks the optimal strategy generated in (i). We validate our method in simulation using normalized load profiles from randomly selected Google clusters, incorporating time-varying grid signals. We can demonstrate significant reductions in carbon cost and peak power with our approach compared to myopic greedy policies, while maintaining computational efficiency and abiding to system and grid constraints.

Carbon-Aware Computing for Data Centers with Probabilistic Performance Guarantees

TL;DR

The paper tackles the problem of reducing data-center carbon footprint and peak power by coordinating when and where to execute flexible compute jobs across a fleet. It introduces a two-layer control approach: day-ahead planning via distributionally robust optimization using a -based ambiguity set and a -based constraint, followed by real-time placement that tracks the planned schedule. A joint optimization of virtual capacity curves (VCCs) and the scheduling policy yields provable probabilistic guarantees and enables exploitation of spatial and temporal job flexibility, including DR signals. Experiments on Google-like load profiles show substantial reductions in carbon cost and peak power compared with myopic greedy policies, with tunable robustness and performance trade-offs. The results highlight practical viability and potential for DR participation and long-term grid planning, thanks to the scalable LP reformulation and receding-horizon extensions.

Abstract

Data centers are significant contributors to carbon emissions and can strain power systems due to their high electricity consumption. To mitigate this impact and to participate in demand response programs, cloud computing companies strive to balance and optimize operations across their global fleets by making strategic decisions about when and where to place compute jobs for execution. In this paper, we introduce a load shaping scheme which reacts to time-varying grid signals by leveraging both temporal and spatial flexibility of compute jobs to provide risk-aware management guidelines and job placement with provable performance guarantees based on distributionally robust optimization. Our approach divides the problem into two key components: (i) day-ahead planning, which generates an optimal scheduling strategy based on historical load data, and (ii) real-time job placement and (time) scheduling, which dynamically tracks the optimal strategy generated in (i). We validate our method in simulation using normalized load profiles from randomly selected Google clusters, incorporating time-varying grid signals. We can demonstrate significant reductions in carbon cost and peak power with our approach compared to myopic greedy policies, while maintaining computational efficiency and abiding to system and grid constraints.

Paper Structure

This paper contains 22 sections, 2 theorems, 21 equations, 11 figures, 3 tables.

Key Result

Proposition 1

Assume that $s\in\mathcal{S}$ then, the DRO problem eq:CVARprob_robust can be reformulated as an LPThe infinity norm term in the cost function eq:Cost can be reformulated in LP form, but we omit it here to focus on the constraint reformulation. where $q\in\mathbb{R}\,$, $\lambda\in\mathbb{R}\,$, $p^i\in\mathbb{R}\,$, and $\eta_{idt}\in\mathbb{R}^{g}$ are auxiliary variables.

Figures (11)

  • Figure 1: An example of the aggregate load schedule for compute loads from two flexibility classes $c\in \{1,2\}$ at a single cluster $d=1$ over $24$ hours. The VCC (solid line) $v_{t,1}$ limits the allocable load at each time interval $t$, while the true capacity $\overline{v}_{t,d}$ (dashed line) is obtained by subtracting the inflexible load$^2$ from the (cluster) machine capacity.
  • Figure 2: Schematic of the two layered control approach separated into day-ahead planning and real-time execution.
  • Figure 3: Comparing load profiles under the optimal schedule $Y_{k,c,t,d}^*\cdot s_{k,c}^i, \forall t\in \mathcal{T}, d\in \mathcal{D}$ for 60 training scenarios $s^i_{\text{train}}, i\in \mathbb{Z}_{60}$ (left) and 15 validation scenarios $s^i_{\text{val}}, i\in \mathbb{Z}_{15}$ (right).
  • Figure 4: Comparing the running load of the DRO and greedy policy over one day.
  • Figure 5: Comparison of VCCs and the realised load distribution for discrete jobs submitted over the day for three different $\beta$ values.
  • ...and 6 more figures

Theorems & Definitions (10)

  • Definition 1
  • Definition 2: Wasserstein distance esfahani2017data
  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Proposition 1
  • proof
  • Proposition 2
  • proof