Incentivizing Permissionless Distributed Learning of LLMs
Joel Lidin, Amir Sarfi, Evangelos Pappas, Samuel Dare, Eugene Belilovsky, Jacob Steeves
TL;DR
Gauntlet tackles the challenge of incentivizing permissionless distributed learning for large language models by integrating a two-phase on-chain incentive mechanism with a loss-based contribution evaluation. It combines a compute-intensive LossScore/LossRating core with a lightweight fast evaluation and a Proof of Computation to deter cheating, all within a synchronous training framework using DeMo compression. In a live 1.2B model run on the Bittensor network, the system achieved competitive per-iteration convergence and distributed real-valued rewards to contributors, demonstrating the feasibility of open, market-driven decentralized AI development. The work also addresses practical aspects of validator consensus, Byzantine fault tolerance, and cloud-based communication to enable scalable, open collaboration on foundational models.
Abstract
We describe an incentive system for distributed deep learning of foundational models where peers are rewarded for contributions. The incentive system, \textit{Gauntlet}, has been deployed on the bittensor blockchain and used to train a 1.2B LLM with completely permissionless contributions of pseudo-gradients: no control over the users that can register or their hardware. \textit{Gauntlet} can be applied to any synchronous distributed training scheme that relies on aggregating updates or pseudo-gradients. We rely on a two-stage mechanism for fast filtering of peer uptime, reliability, and synchronization, combined with the core component that estimates the loss before and after individual pseudo-gradient contributions. We utilized an OpenSkill rating system to track competitiveness of pseudo-gradient scores across time. Finally, we introduce a novel mechanism to ensure peers on the network perform unique computations. Our live 1.2B run, which has paid out real-valued tokens to participants based on the value of their contributions, yielded a competitive (on a per-iteration basis) 1.2B model that demonstrates the utility of our incentive system.
