Implementation of Big AI Models for Wireless Networks with Collaborative Edge Computing
Liekang Zeng, Shengyuan Ye, Xu Chen, Yang Yang
TL;DR
The paper presents collaborative edge training as a framework to leverage a pool of trusted edge devices for sustainable, privacy-preserving training of large Transformer-based AI models at the wireless edge. It contrasts this approach with cloud, on-device, and federated methods, arguing that collaboration within trusted domains can improve efficiency and privacy while enabling training of models that exceed single-device capacities. A four-phase framework is proposed: profile participants, design energy-aware parallelism, replicate model partitions across devices, and run in-domain training with inter-device data exchange. A case study analyzes four Transformer parallelisms (data, sequence, tensor, pipeline) on realistic Jetson-based testbeds, showing that data and pipeline parallelism with GPU acceleration achieve favorable energy efficiency and enable training of large models that other parallelisms cannot support due to memory or communication limits. The paper also outlines open challenges across sustainability metrics, orchestration, incentives, wireless-network design, and privacy/security tradeoffs, pointing to future research directions for sustainable edge-centric AI training.
Abstract
Big Artificial Intelligence (AI) models have emerged as a crucial element in various intelligent applications at the edge, such as voice assistants in smart homes and autonomous robotics in smart factories. Training big AI models, e.g., for personalized fine-tuning and continual model refinement, poses significant challenges to edge devices due to the inherent conflict between limited computing resources and intensive workload associated with training. Despite the constraints of on-device training, traditional approaches usually resort to aggregating training data and sending it to a remote cloud for centralized training. Nevertheless, this approach is neither sustainable, which strains long-range backhaul transmission and energy-consuming datacenters, nor safely private, which shares users' raw data with remote infrastructures. To address these challenges, we alternatively observe that prevalent edge environments usually contain a diverse collection of trusted edge devices with untapped idle resources, which can be leveraged for edge training acceleration. Motivated by this, in this article, we propose collaborative edge training, a novel training mechanism that orchestrates a group of trusted edge devices as a resource pool for expedited, sustainable big AI model training at the edge. As an initial step, we present a comprehensive framework for building collaborative edge training systems and analyze in-depth its merits and sustainable scheduling choices following its workflow. To further investigate the impact of its parallelism design, we empirically study a case of four typical parallelisms from the perspective of energy demand with realistic testbeds. Finally, we discuss open challenges for sustainable collaborative edge training to point to future directions of edge-centric big AI model training.
