Distributed LLM Pretraining During Renewable Curtailment Windows: A Feasibility Study

Philipp Wiesner; Soeren Becker; Brett Cornick; Dominik Scheinert; Alexander Acker; Odej Kao

Distributed LLM Pretraining During Renewable Curtailment Windows: A Feasibility Study

Philipp Wiesner, Soeren Becker, Brett Cornick, Dominik Scheinert, Alexander Acker, Odej Kao

TL;DR

This technical report presents a system that performs full-parameter LLM training across geo-distributed GPU clusters during regional curtailment windows, elastically switching between local single-site training and federated multi-site synchronization as sites become available or unavailable.

Abstract

Training large language models (LLMs) requires substantial compute and energy. At the same time, renewable energy sources regularly produce more electricity than the grid can absorb, leading to curtailment, the deliberate reduction of clean generation that would otherwise go to waste. These periods represent an opportunity: if training is aligned with curtailment windows, LLMs can be pretrained using electricity that is both clean and cheap. This technical report presents a system that performs full-parameter LLM training across geo-distributed GPU clusters during regional curtailment windows, elastically switching between local single-site training and federated multi-site synchronization as sites become available or unavailable. Our prototype trains a 561M-parameter transformer model across three clusters using the Flower federated learning framework, with curtailment periods derived from real-world marginal carbon intensity traces. Preliminary results show that curtailment-aware scheduling preserves training quality while reducing operational emissions to 5-12% of single-site baselines.

Distributed LLM Pretraining During Renewable Curtailment Windows: A Feasibility Study

TL;DR

Abstract

Paper Structure (30 sections, 1 equation, 4 figures, 1 table)

This paper contains 30 sections, 1 equation, 4 figures, 1 table.

Introduction
Related Work
Carbon-aware computing.
Exploiting curtailment windows.
Sustainable federated learning.
Distributed LLM pretraining.
System Design
Problem Setting
Architecture and Execution Model
Curtailment-Aware Provisioning
Round Sizing and Overhead
Data Management
Implementation
Provisioning and cluster integration.
Federation orchestration.
...and 15 more sections

Figures (4)

Figure 1: Sites train only during curtailment windows (green), when renewable generation exceeds demand. If multiple sites are curtailed simultaneously, they train locally in parallel and periodically average model states.
Figure 2: Execution timeline of curtailment-aware pretraining, showing how training shifts across renewable curtailment windows in California, Texas, and South Australia.
Figure 3: Curtailment-aware training reaches comparable perplexity as centralized training.
Figure 4: Fraction of training energy drawn during curtailment windows.

Distributed LLM Pretraining During Renewable Curtailment Windows: A Feasibility Study

TL;DR

Abstract

Distributed LLM Pretraining During Renewable Curtailment Windows: A Feasibility Study

Authors

TL;DR

Abstract

Table of Contents

Figures (4)