SplitCom: Communication-efficient Split Federated Fine-tuning of LLMs via Temporal Compression
Tao Li, Yulin Tang, Yiyang Song, Cong Wu, Xihui Liu, Pan Li, Xianhao Chen
TL;DR
SplitCom tackles the high communication costs of split federated fine-tuning for large language models by exploiting inter-epoch temporal redundancy in activations. It introduces similarity-aware activation reuse with two adaptive threshold controls (bang-bang and DDPG-based RL) and uses random projection to reduce client-side memory. The framework extends to privacy-preserving U-shape SplitCom, enabling bidirectional temporal compression of activations and gradients while keeping labels on the client. Experimental results show substantial uplink and total communication reductions (up to 98.6% and 95.8%, respectively) with negligible performance loss, enabling practical on-device fine-tuning of LLMs under tight resource constraints.
Abstract
Federated fine-tuning of on-device large language models (LLMs) mitigates privacy concerns by preventing raw data sharing. However, the intensive computational and memory demands pose significant challenges for resource-constrained edge devices. To overcome these limitations, split federated learning (SFL) emerges as a promising solution that partitions the model into lightweight client-side and compute-intensive server-side sub-models, thus offloading the primary training workload to a powerful server. Nevertheless, high-dimensional activation exchanges in SFL lead to excessive communication overhead. To overcome this, we propose SplitCom, a communication-efficient SFL framework for LLMs that exploits temporal redundancy in activations across consecutive training epochs. Inspired by video compression, the core innovation of our framework lies in selective activation uploading only when a noticeable deviation from previous epochs occurs. To balance communication efficiency and learning performance, we introduce two adaptive threshold control schemes based on 1) bang-bang control or 2) deep deterministic policy gradient (DDPG)-based reinforcement learning. Moreover, we implement dimensionality reduction techniques to alleviate client-side memory requirements. Furthermore, we extend SplitCom to the U-shape architecture, ensuring the server never accesses clients' labels. Extensive simulations and laboratory experiments demonstrate that SplitCom reduces uplink communication costs by up to 98.6\,\% in its standard configuration and total communication costs by up to 95.8\,\% in its U-shape variant without noticeably compromising model performance.
