Table of Contents
Fetching ...

OpenDC-STEAM: Realistic Modeling and Systematic Exploration of Composable Techniques for Sustainable Datacenters

Dante Niewenhuis, Sacheendra Talluri, Alexandru Iosup, Tiziano de Matteis

Abstract

The need to reduce datacenter carbon footprint is urgent. While many sustainability techniques have been proposed, they are often evaluated in isolation, using limited setups or analytical models that overlook real-world dynamics and interactions between methods. This makes it challenging for researchers and operators to understand the effectiveness and trade-offs of combining such techniques. We design OpenDC-STEAM, an open-source customizable datacenter simulator, to investigate the individual and combined impact of sustainability techniques on datacenter operational and embodied carbon emissions, and their trade-off with performance. Using STEAM, we systematically explore three representative techniques - horizontal scaling, leveraging batteries, and temporal shifting - with diverse representative workloads, datacenter configurations, and carbon-intensity traces. Our analysis highlights that datacenter dynamics can influence their effectiveness and that combining strategies can significantly lower emissions, but introduces complex cost-emissions-performance trade-offs that STEAM can help navigate. STEAM supports the integration of new models and techniques, making it a foundation framework for holistic, quantitative, and reproducible research in sustainable computing. Following open-science principles, STEAM is available as FOSS: https://github.com/atlarge-research/OpenDC-STEAM.

OpenDC-STEAM: Realistic Modeling and Systematic Exploration of Composable Techniques for Sustainable Datacenters

Abstract

The need to reduce datacenter carbon footprint is urgent. While many sustainability techniques have been proposed, they are often evaluated in isolation, using limited setups or analytical models that overlook real-world dynamics and interactions between methods. This makes it challenging for researchers and operators to understand the effectiveness and trade-offs of combining such techniques. We design OpenDC-STEAM, an open-source customizable datacenter simulator, to investigate the individual and combined impact of sustainability techniques on datacenter operational and embodied carbon emissions, and their trade-off with performance. Using STEAM, we systematically explore three representative techniques - horizontal scaling, leveraging batteries, and temporal shifting - with diverse representative workloads, datacenter configurations, and carbon-intensity traces. Our analysis highlights that datacenter dynamics can influence their effectiveness and that combining strategies can significantly lower emissions, but introduces complex cost-emissions-performance trade-offs that STEAM can help navigate. STEAM supports the integration of new models and techniques, making it a foundation framework for holistic, quantitative, and reproducible research in sustainable computing. Following open-science principles, STEAM is available as FOSS: https://github.com/atlarge-research/OpenDC-STEAM.
Paper Structure (41 sections, 19 figures, 2 tables)

This paper contains 41 sections, 19 figures, 2 tables.

Figures (19)

  • Figure 1: STEAM quantifies the impact and trade-offs of sustainability techniques. In the figure, (A) results for the Surf workload are shown as total carbon reduction [%], (B) peak power draw [kW], and (C) average task delay [h]. Stars indicate the best performing technique for each metric. HS: Horizontal Scaling, TS: Temporal Shifting, B: Batteries.
  • Figure 2: Example of a scenario challenging for analytical models to evaluate correctly.
  • Figure 3: STEAM system architecture.
  • Figure 4: Example component graph depicting events connecting two new components: task stopper and GPU.
  • Figure 5: Impact of horizontal scaling on total carbon emissions (operational + embodied) and performance, quantified by SLA violations (tasks not scheduled within 24 hours of submission). For all metrics, lower is better. Gray and red lines indicate the original and required scale for <1% SLA violations. Bottom row shows the impact when the datacenter is exposed to failures.
  • ...and 14 more figures