OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training

Sami Jaghouar; Jack Min Ong; Johannes Hagemann

OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training

Sami Jaghouar, Jack Min Ong, Johannes Hagemann

TL;DR

OpenDiLoCo addresses the challenge of training large language models under global, bandwidth-constrained conditions by leveraging a low-communication training paradigm (DiLoCo) implemented in an open-source framework. It provides both a compact PyTorch reference and a Hivemind-based implementation, enabling replication and real-world decentralized deployments, underpinned by a dual-optimizer local SGD that uses two model copies to generate pseudo-gradients and supports FP16 all-reduce. The authors reproduce DiLoCo on a 150M-parameter model and scale the approach to 1.1B parameters, demonstrating comparable or improved perplexity with far lower communication and achieving 90-95% compute utilization in a globally distributed setting. They also show FP16 all-reduce is effective for pseudo-gradients and explore scalability via integration with PyTorch FSDP, offering practical insights into asynchronous opportunities and future work for even larger models. Overall, OpenDiLoCo establishes a practical, open-source pathway for globally distributed low-communication training with strong empirical support and clear directions for scaling and efficiency improvements.

Abstract

OpenDiLoCo is an open-source implementation and replication of the Distributed Low-Communication (DiLoCo) training method for large language models. We provide a reproducible implementation of the DiLoCo experiments, offering it within a scalable, decentralized training framework using the Hivemind library. We demonstrate its effectiveness by training a model across two continents and three countries, while maintaining 90-95% compute utilization. Additionally, we conduct ablations studies focusing on the algorithm's compute efficiency, scalability in the number of workers and show that its gradients can be all-reduced using FP16 without any performance degradation. Furthermore, we scale OpenDiLoCo to 3x the size of the original work, demonstrating its effectiveness for billion parameter models.

OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training

TL;DR

Abstract

Paper Structure (15 sections, 8 figures, 3 tables)

This paper contains 15 sections, 8 figures, 3 tables.

Introduction
Implementation
General Implementation Details
Implementation with torch.distributed
Hivemind Implementation
Experiments
Replication Experiment Setup
Main Results
Number of Worker and FLOP Efficiency Ablation
Practical Usage
All-Reduce in FP16
Scaling DiLoCo to Billion Parameter Models
Globally Distributed Training Setting
Conclusion
Model Configuration

Figures (8)

Figure 1: Pseudo-Code for Outer Optimizer in OpenDiLoCo.
Figure 2: OpenDiLoCo - Hivemind API.
Figure 3: Main result: 150 million parameter Llama model pre-training with 8 DiLoCo workers yields significantly lower perplexity than the baseline without DiLoCo, and even compared to the baseline using 8 times larger batch size with the same compute budget, while communicating 500 times less.
Figure 4: Ablation Study on the Number of Workers in DiLoCo: Performance comparison of DiLoCo with different numbers of workers and 50 local steps against the baseline without DiLoCo. Due to compute constraints, these ablation experiments were not extended to $88{,}000$ steps like the other experiments.
Figure 5: Ablation Study on FLOP Efficiency Relative to Number of Workers in DiLoCo: This figure compares the performance of DiLoCo with different numbers of workers and 50 local steps against the baseline without DiLoCo. The x-axis shows the global steps instead of local steps, providing a better approximation of DiLoCo's FLOP efficiency by comparing the total amount of compute spent on the model.
...and 3 more figures

OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training

TL;DR

Abstract

OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training

Authors

TL;DR

Abstract

Table of Contents

Figures (8)