Carbon-Aware Quality Adaptation for Energy-Intensive Services
Philipp Wiesner, Dennis Grinwald, Philipp Weiß, Patrick Wilhelm, Ramin Khalili, Odej Kao
TL;DR
The paper addresses reducing the carbon footprint of energy-intensive cloud services by modulating the proportion of requests served at two QoR tiers in response to temporal carbon intensity. It proposes a forecast-based multi-horizon optimization that jointly provisions deployments $D^i$ and allocations $A^i$ to achieve a target QoR, while minimizing emissions $E^i = \sum_{m \in \mathcal{M}} \sum_{q \in \mathcal{Q}} d^{i}_{m,q} ( \Delta p^i_{m,q} C^i + C_m^{emb} )$. A NP-hard optimization is shown via a reduction from Bin Packing; the authors implement a two-tier online approach with long-term MILP planning and short-term adjustments, evaluated on a year of LLM-inference traces across regions. Results show that carbon-aware QoR can yield up to about 10% additional savings beyond energy efficiency, with realistic online performance reaching around 82% of an ideal upper bound; the method provides practical gains under realistic forecast conditions. This work demonstrates a novel axis for carbon reduction in latency-constrained interactive services without geo-distributed load balancing.
Abstract
The energy demand of modern cloud services, particularly those related to generative AI, is increasing at an unprecedented pace. To date, carbon-aware computing strategies have primarily focused on batch process scheduling or geo-distributed load balancing. However, such approaches are not applicable to services that require constant availability at specific locations due to latency, privacy, data, or infrastructure constraints. In this paper, we explore how the carbon footprint of energy-intensive services can be reduced by adjusting the fraction of requests served by different service quality tiers. We show that adapting this quality of responses with respect to grid carbon intensity can lead to additional carbon savings beyond resource and energy efficiency and introduce a forecast-based multi-horizon optimization that reaches close-to-optimal carbon savings.
