LACS: Learning-Augmented Algorithms for Carbon-Aware Resource Scaling with Uncertain Demand
Roozbeh Bostandoost, Adam Lechowicz, Walid A. Hanafy, Noman Bashir, Prashant Shenoy, Mohammad Hajiesmaili
TL;DR
This work addresses online carbon-aware resource scaling when job lengths are unknown by formulating the Online Carbon-Aware Scaling with Unknown Demand problem ($OCSU$) and proposing LACS, a learning-augmented algorithm that leverages predicted job lengths while preserving worst-case guarantees. Building on the Ramp-On Ramp-Off ($\texttt{RORO}$) framework, LACS combines robust baselines with a predictor to achieve $(\alpha+\gamma)$-consistency and bounded robustness, supported by formal competitive analyses. The empirical evaluation on CAISO carbon traces shows LACS achieves carbon footprints within $1.2\%$ of an online baseline with perfect information and within $16\%$ of an offline baseline with forecasts, while delivering up to $32\%$ reductions versus deadline-aware, carbon-agnostic execution. This work advances practical deployment by addressing uncertain workload durations, switching costs, and forecast errors without over-reliance on carbon-intensity forecasts, enabling more carbon-efficient cloud operations.
Abstract
Motivated by an imperative to reduce the carbon emissions of cloud data centers, this paper studies the online carbon-aware resource scaling problem with unknown job lengths (OCSU) and applies it to carbon-aware resource scaling for executing computing workloads. The task is to dynamically scale resources (e.g., the number of servers) assigned to a job of unknown length such that it is completed before a deadline, with the objective of reducing the carbon emissions of executing the workload. The total carbon emissions of executing a job originate from the emissions of running the job and excess carbon emitted while switching between different scales (e.g., due to checkpoint and resume). Prior work on carbon-aware resource scaling has assumed accurate job length information, while other approaches have ignored switching losses and require carbon intensity forecasts. These assumptions prohibit the practical deployment of prior work for online carbon-aware execution of scalable computing workload. We propose LACS, a theoretically robust learning-augmented algorithm that solves OCSU. To achieve improved practical average-case performance, LACS integrates machine-learned predictions of job length. To achieve solid theoretical performance, LACS extends the recent theoretical advances on online conversion with switching costs to handle a scenario where the job length is unknown. Our experimental evaluations demonstrate that, on average, the carbon footprint of LACS lies within 1.2% of the online baseline that assumes perfect job length information and within 16% of the offline baseline that, in addition to the job length, also requires accurate carbon intensity forecasts. Furthermore, LACS achieves a 32% reduction in carbon footprint compared to the deadline-aware carbon-agnostic execution of the job.
