Table of Contents
Fetching ...

FedSRD: Sparsify-Reconstruct-Decompose for Communication-Efficient Federated Large Language Models Fine-Tuning

Guochen Yan, Luyuan Xie, Qingni Shen, Yuejian Fang, Zhonghai Wu

TL;DR

This paper tackles the high communication costs of federated fine-tuning for large language models by introducing FedSRD, a Sparsify-Reconstruct-Decompose framework. It combines an importance-aware sparsification on clients, full-rank reconstruction and aggregation on the server, and a Taylor-approximation-based alternating decomposition to produce a single sparse, low-rank update for download; an efficient variant FedSRD-e further reduces server computation. Empirical results across 10 benchmarks and two base models show up to a 90% reduction in round-trip communication while achieving or exceeding baseline performance, particularly under non-IID data distributions. The approach enables robust, privacy-preserving collaborative fine-tuning on the decentralized Web, with practical trade-offs demonstrated by the lightweight FedSRD-e variant.

Abstract

The current paradigm of training large language models (LLMs) on publicly available Web data is becoming unsustainable, with high-quality data sources in specialized domains nearing exhaustion. Federated Learning (FL) emerges as a practical solution for the next generation of AI on a decentralized Web, enabling privacy-preserving collaborative fine-tuning by leveraging private data distributed across a global client base. While Low-Rank Adaptation (LoRA) is the standard for efficient fine-tuning, its application in federated settings presents a critical challenge: communication overhead remains a significant bottleneck across the Web's heterogeneous network conditions. The structural redundancy within LoRA parameters not only incurs a heavy communication burden but also introduces conflicts when aggregating client updates. To address this, we propose FedSRD, a Sparsify-Reconstruct-Decompose framework designed for communication-efficient federated LLMs fine-tuning. We first introduce an importance-aware sparsification method that preserves the structural integrity of LoRA updates to reduce the uploaded parameter count. The server then reconstructs and aggregates these updates in a full-rank space to mitigate conflicts. Finally, it decomposes the global update into a sparse low-rank format for broadcast, ensuring a symmetrically efficient cycle. We also propose an efficient variant, FedSRD-e, to reduce computational overhead. Experimental results on 10 benchmarks demonstrate that our framework significantly reduces communication costs by up to 90\% while even improving model performance on heterogeneous client data.

FedSRD: Sparsify-Reconstruct-Decompose for Communication-Efficient Federated Large Language Models Fine-Tuning

TL;DR

This paper tackles the high communication costs of federated fine-tuning for large language models by introducing FedSRD, a Sparsify-Reconstruct-Decompose framework. It combines an importance-aware sparsification on clients, full-rank reconstruction and aggregation on the server, and a Taylor-approximation-based alternating decomposition to produce a single sparse, low-rank update for download; an efficient variant FedSRD-e further reduces server computation. Empirical results across 10 benchmarks and two base models show up to a 90% reduction in round-trip communication while achieving or exceeding baseline performance, particularly under non-IID data distributions. The approach enables robust, privacy-preserving collaborative fine-tuning on the decentralized Web, with practical trade-offs demonstrated by the lightweight FedSRD-e variant.

Abstract

The current paradigm of training large language models (LLMs) on publicly available Web data is becoming unsustainable, with high-quality data sources in specialized domains nearing exhaustion. Federated Learning (FL) emerges as a practical solution for the next generation of AI on a decentralized Web, enabling privacy-preserving collaborative fine-tuning by leveraging private data distributed across a global client base. While Low-Rank Adaptation (LoRA) is the standard for efficient fine-tuning, its application in federated settings presents a critical challenge: communication overhead remains a significant bottleneck across the Web's heterogeneous network conditions. The structural redundancy within LoRA parameters not only incurs a heavy communication burden but also introduces conflicts when aggregating client updates. To address this, we propose FedSRD, a Sparsify-Reconstruct-Decompose framework designed for communication-efficient federated LLMs fine-tuning. We first introduce an importance-aware sparsification method that preserves the structural integrity of LoRA updates to reduce the uploaded parameter count. The server then reconstructs and aggregates these updates in a full-rank space to mitigate conflicts. Finally, it decomposes the global update into a sparse low-rank format for broadcast, ensuring a symmetrically efficient cycle. We also propose an efficient variant, FedSRD-e, to reduce computational overhead. Experimental results on 10 benchmarks demonstrate that our framework significantly reduces communication costs by up to 90\% while even improving model performance on heterogeneous client data.

Paper Structure

This paper contains 40 sections, 19 equations, 6 figures, 10 tables, 1 algorithm.

Figures (6)

  • Figure 1: An illustration of Federated LoRA Fine-tuning. Clients train and communicate only the LoRA matrices. However, this payload can still amount to hundreds of megabytes per round, posing a significant bottleneck.
  • Figure 2: Performance comparison under different sparsity ratios. Both random and magnitude-based sparsification suffer performance decline at high sparsity ratios, motivating the need for a more advanced, structure-aware strategy.
  • Figure 3: The FedSRD framework. (1) Client-Side sparsification: Each client computes its LoRA updates $\Delta B_i^t$ and $\Delta A_i^t$ and uses our importance-aware sparsification to generate sparse updates $\Delta B_{i, s}^t$ and $\Delta A_{i, s}^t$ for upload. (2) Server-Side reconstruction to aggregation: The server reconstructs each client's full weight matrix $W_i^t$ and aggregates them in the full-rank space. (3) Server-Side decomposition: The server computes the global update $\Delta W^t$, which is then decomposed into a single sparse matrix for efficient download.
  • Figure 4: Average in-domain performance vs. per-round communication cost (Llama3.2-3B). Our methods are Pareto optimal, achieving the highest performance with the lowest communication cost.
  • Figure 5: Average in-domain performance vs. per-round communication cost (Qwen2-7B). Our methods are Pareto optimal, achieving the highest performance with the lowest communication cost.
  • ...and 1 more figures