SplitCom: Communication-efficient Split Federated Fine-tuning of LLMs via Temporal Compression

Tao Li; Yulin Tang; Yiyang Song; Cong Wu; Xihui Liu; Pan Li; Xianhao Chen

SplitCom: Communication-efficient Split Federated Fine-tuning of LLMs via Temporal Compression

Tao Li, Yulin Tang, Yiyang Song, Cong Wu, Xihui Liu, Pan Li, Xianhao Chen

TL;DR

SplitCom tackles the high communication costs of split federated fine-tuning for large language models by exploiting inter-epoch temporal redundancy in activations. It introduces similarity-aware activation reuse with two adaptive threshold controls (bang-bang and DDPG-based RL) and uses random projection to reduce client-side memory. The framework extends to privacy-preserving U-shape SplitCom, enabling bidirectional temporal compression of activations and gradients while keeping labels on the client. Experimental results show substantial uplink and total communication reductions (up to 98.6% and 95.8%, respectively) with negligible performance loss, enabling practical on-device fine-tuning of LLMs under tight resource constraints.

Abstract

Federated fine-tuning of on-device large language models (LLMs) mitigates privacy concerns by preventing raw data sharing. However, the intensive computational and memory demands pose significant challenges for resource-constrained edge devices. To overcome these limitations, split federated learning (SFL) emerges as a promising solution that partitions the model into lightweight client-side and compute-intensive server-side sub-models, thus offloading the primary training workload to a powerful server. Nevertheless, high-dimensional activation exchanges in SFL lead to excessive communication overhead. To overcome this, we propose SplitCom, a communication-efficient SFL framework for LLMs that exploits temporal redundancy in activations across consecutive training epochs. Inspired by video compression, the core innovation of our framework lies in selective activation uploading only when a noticeable deviation from previous epochs occurs. To balance communication efficiency and learning performance, we introduce two adaptive threshold control schemes based on 1) bang-bang control or 2) deep deterministic policy gradient (DDPG)-based reinforcement learning. Moreover, we implement dimensionality reduction techniques to alleviate client-side memory requirements. Furthermore, we extend SplitCom to the U-shape architecture, ensuring the server never accesses clients' labels. Extensive simulations and laboratory experiments demonstrate that SplitCom reduces uplink communication costs by up to 98.6\,\% in its standard configuration and total communication costs by up to 95.8\,\% in its U-shape variant without noticeably compromising model performance.

SplitCom: Communication-efficient Split Federated Fine-tuning of LLMs via Temporal Compression

TL;DR

Abstract

Paper Structure (18 sections, 1 equation, 9 figures, 12 tables, 1 algorithm)

This paper contains 18 sections, 1 equation, 9 figures, 12 tables, 1 algorithm.

Introduction
Related Work
Design of SplitCom
Overview
Similarity-aware Activation Reuse
Similarity Threshold Control
The Workflow of SplitCom
Extensions to Other SplitCom Variants
Extension to Bidirectional Compression
Extension to U-shape SplitCom
Implementations
Experimental Results
Effectiveness of SplitCom
Integration with INT8 quantization
Comparison of BBC and DDPG
...and 3 more sections

Figures (9)

Figure 1: Total uplink communication costs (for 10 clients) for LoRA aggregation and activation uploading until model convergence (GPT2 Small with 50 epochs and GPT2 XLarge with 10 epochs) under split federated LoRA fine-tuning. We partition the E2E dataset into these 10 clients.
Figure 2: The Cosine similarity of activations between the current and previous epoch in split federated fine-tuning for two GPT2 models.
Figure 3: Comparison of the proposed method against traditional SFFT and traditional SFFT with INT4 activation quantization on the GPT-2 XLarge model fine-tuned over the E2E dataset with 10 clients, in terms of uplink communication overhead and model performance.
Figure 4: Overview of SplitCom.
Figure 5: Our SplitCom testbed composed of an RTX 4090 server and ten Jetson Orin NX clients.
...and 4 more figures

SplitCom: Communication-efficient Split Federated Fine-tuning of LLMs via Temporal Compression

TL;DR

Abstract

SplitCom: Communication-efficient Split Federated Fine-tuning of LLMs via Temporal Compression

Authors

TL;DR

Abstract

Table of Contents

Figures (9)