Will LLMs Scaling Hit the Wall? Breaking Barriers via Distributed Resources on Massive Edge Devices
Tao Shen, Didi Zhu, Ziyu Zhao, Zexi Li, Chao Wu, Fei Wu
TL;DR
The paper argues that scaling laws for foundation models are approaching data and compute bottlenecks due to finite high-quality public data and centralized compute power. It proposes leveraging massive distributed edge devices to democratize AI by using edge-generated data and aggregated edge compute for training large models. It surveys data and compute trends, edge-data advantages, and technical advances in small language models, collaborative inference, and on-device/collaborative training, while outlining open problems in heterogenous device fusion and compute sharing. If solved, edge-based distributed training could broaden participation in AI development, reduce environmental impact, and reshape the AI landscape toward more diverse, locally adaptable models.
Abstract
The remarkable success of foundation models has been driven by scaling laws, demonstrating that model performance improves predictably with increased training data and model size. However, this scaling trajectory faces two critical challenges: the depletion of high-quality public data, and the prohibitive computational power required for larger models, which have been monopolized by tech giants. These two bottlenecks pose significant obstacles to the further development of AI. In this position paper, we argue that leveraging massive distributed edge devices can break through these barriers. We reveal the vast untapped potential of data and computational resources on massive edge devices, and review recent technical advancements in distributed/federated learning that make this new paradigm viable. Our analysis suggests that by collaborating on edge devices, everyone can participate in training large language models with small edge devices. This paradigm shift towards distributed training on edge has the potential to democratize AI development and foster a more inclusive AI community.
