FedSRD: Sparsify-Reconstruct-Decompose for Communication-Efficient Federated Large Language Models Fine-Tuning
Guochen Yan, Luyuan Xie, Qingni Shen, Yuejian Fang, Zhonghai Wu
TL;DR
This paper tackles the high communication costs of federated fine-tuning for large language models by introducing FedSRD, a Sparsify-Reconstruct-Decompose framework. It combines an importance-aware sparsification on clients, full-rank reconstruction and aggregation on the server, and a Taylor-approximation-based alternating decomposition to produce a single sparse, low-rank update for download; an efficient variant FedSRD-e further reduces server computation. Empirical results across 10 benchmarks and two base models show up to a 90% reduction in round-trip communication while achieving or exceeding baseline performance, particularly under non-IID data distributions. The approach enables robust, privacy-preserving collaborative fine-tuning on the decentralized Web, with practical trade-offs demonstrated by the lightweight FedSRD-e variant.
Abstract
The current paradigm of training large language models (LLMs) on publicly available Web data is becoming unsustainable, with high-quality data sources in specialized domains nearing exhaustion. Federated Learning (FL) emerges as a practical solution for the next generation of AI on a decentralized Web, enabling privacy-preserving collaborative fine-tuning by leveraging private data distributed across a global client base. While Low-Rank Adaptation (LoRA) is the standard for efficient fine-tuning, its application in federated settings presents a critical challenge: communication overhead remains a significant bottleneck across the Web's heterogeneous network conditions. The structural redundancy within LoRA parameters not only incurs a heavy communication burden but also introduces conflicts when aggregating client updates. To address this, we propose FedSRD, a Sparsify-Reconstruct-Decompose framework designed for communication-efficient federated LLMs fine-tuning. We first introduce an importance-aware sparsification method that preserves the structural integrity of LoRA updates to reduce the uploaded parameter count. The server then reconstructs and aggregates these updates in a full-rank space to mitigate conflicts. Finally, it decomposes the global update into a sparse low-rank format for broadcast, ensuring a symmetrically efficient cycle. We also propose an efficient variant, FedSRD-e, to reduce computational overhead. Experimental results on 10 benchmarks demonstrate that our framework significantly reduces communication costs by up to 90\% while even improving model performance on heterogeneous client data.
