Snake Learning: A Communication- and Computation-Efficient Distributed Learning Framework for 6G
Xiaoxue Yu, Xingfu Yi, Rongpeng Li, Fei Wang, Chenghui Peng, Zhifeng Zhao, Honggang Zhang
TL;DR
Snake Learning addresses core challenges of 6G distributed learning by using serpentine, layer-wise updates that sequentially train designated middle layers across heterogeneous nodes, reducing synchronization, memory, and communication demands. The framework integrates Service Provider, Process Controller/Engine, and Local Manager/Engine components and leverages Knowledge Distillation to mitigate inter-node data heterogeneity, with CS and P2P deployment modes. Feasibility studies on CIFAR-10 with VGG-11 and LLM fine-tuning on OPT-1.3B and Llama-3-8B demonstrate faster convergence, substantially lower memory usage (e.g., from ~19.37 GB to ~3.13 GB), and notable communication savings while maintaining competitive accuracy. These results indicate strong potential for edge-native AIaaS in 6G, though open research directions include API interoperability, fine-grained layer assignment, and robust resource scheduling across dynamic networks.
Abstract
In the evolution towards 6G, integrating Artificial Intelligence (AI) with advanced network infrastructure emerges as a pivotal strategy for enhancing network intelligence and resource utilization. Existing distributed learning frameworks like Federated Learning and Split Learning often struggle with significant challenges in dynamic network environments including high synchronization demands, costly communication overhead, severe computing resource consumption, and data heterogeneity across network nodes. These obstacles hinder the applications of ubiquitous computing capabilities of 6G networks, especially in light of the trend of escalating model parameters and training data volumes. To address these challenges effectively, this paper introduces ``Snake Learning", a cost-effective distributed learning framework. Specifically, Snake Learning respects the heterogeneity of inter-node computing capability and local data distribution in 6G networks, and sequentially trains the designated part of model layers on individual nodes. This layer-by-layer serpentine update mechanism contributes to significantly reducing the requirements for storage, memory and communication during the model training phase, and demonstrates superior adaptability and efficiency for both classification and fine-tuning tasks across homogeneous and heterogeneous data distributions.
