Table of Contents
Fetching ...

Elastic Mixture of Rank-Wise Experts for Knowledge Reuse in Federated Fine-Tuning

Yebo Wu, Jingguang Li, Zhijiang Guo, Li Li

TL;DR

SmartFed tackles the high resource costs of federated fine-tuning for large language models by reusing existing LoRA modules through a trainable router and introducing a fine-grained, rank-wise knowledge fusion mechanism (MoRE). It further optimizes resource use with Elastic Expert Quota Allocation (EEQA), which adaptively allocates activation quotas to rank-wise experts based on their contribution. Extensive experiments across multiple models and diverse tasks show that SmartFed delivers consistent accuracy improvements, significantly faster convergence, and substantial reductions in communication and energy costs compared to baselines. This approach enables scalable, privacy-preserving fine-tuning on edge devices by leveraging public LoRA modules and sparse, input-conditioned routing.

Abstract

Federated fine-tuning offers a promising solution for adapting Large Language Models (LLMs) to downstream tasks while safeguarding data privacy. However, its high computational and communication demands hinder its deployment on resource-constrained devices. In this paper, we propose SmartFed, a resource-efficient federated fine-tuning framework. SmartFed intelligently reuses knowledge embedded in existing LoRA modules, eliminating the need for expensive training from scratch when adapting LLMs to new tasks. To effectively exploit this knowledge and ensure scalability, we introduce the Mixture of Rank-Wise Experts (MoRE). MoRE decomposes LoRA modules into fine-grained rank-level experts. These experts are selectively activated and combined based on input semantics and resource budgets. Moreover, to optimize resource utilization, we present the Elastic Expert Quota Allocation (EEQA). EEQA adaptively allocates expert capacity across parameter matrices based on their contribution to model performance, focusing computing resources on the critical experts. Extensive evaluations across multiple benchmarks demonstrate that SmartFed significantly outperforms existing methods in model performance and training efficiency.

Elastic Mixture of Rank-Wise Experts for Knowledge Reuse in Federated Fine-Tuning

TL;DR

SmartFed tackles the high resource costs of federated fine-tuning for large language models by reusing existing LoRA modules through a trainable router and introducing a fine-grained, rank-wise knowledge fusion mechanism (MoRE). It further optimizes resource use with Elastic Expert Quota Allocation (EEQA), which adaptively allocates activation quotas to rank-wise experts based on their contribution. Extensive experiments across multiple models and diverse tasks show that SmartFed delivers consistent accuracy improvements, significantly faster convergence, and substantial reductions in communication and energy costs compared to baselines. This approach enables scalable, privacy-preserving fine-tuning on edge devices by leveraging public LoRA modules and sparse, input-conditioned routing.

Abstract

Federated fine-tuning offers a promising solution for adapting Large Language Models (LLMs) to downstream tasks while safeguarding data privacy. However, its high computational and communication demands hinder its deployment on resource-constrained devices. In this paper, we propose SmartFed, a resource-efficient federated fine-tuning framework. SmartFed intelligently reuses knowledge embedded in existing LoRA modules, eliminating the need for expensive training from scratch when adapting LLMs to new tasks. To effectively exploit this knowledge and ensure scalability, we introduce the Mixture of Rank-Wise Experts (MoRE). MoRE decomposes LoRA modules into fine-grained rank-level experts. These experts are selectively activated and combined based on input semantics and resource budgets. Moreover, to optimize resource utilization, we present the Elastic Expert Quota Allocation (EEQA). EEQA adaptively allocates expert capacity across parameter matrices based on their contribution to model performance, focusing computing resources on the critical experts. Extensive evaluations across multiple benchmarks demonstrate that SmartFed significantly outperforms existing methods in model performance and training efficiency.

Paper Structure

This paper contains 23 sections, 13 equations, 16 figures, 3 tables, 1 algorithm.

Figures (16)

  • Figure 1: Illustration of the classic federated fine-tuning framework versus our SmartFed. Classic methods train LoRA modules from scratch (a), while SmartFed updates only the router (b).
  • Figure 2: Comparison of three knowledge reuse strategies using two task-specific LoRA modules: (a) synthesizing a single LoRA module through linear arithmetic, (b) integrating information from entire LoRA modules, and (c) integrating information from rank-wise components.
  • Figure 3: Performance and inference latency comparison of different knowledge reuse strategies.
  • Figure 4: Heterogeneous importance of rank-wise experts. (a) Importance distribution of the first-layer Query matrix for Chinese and Math LoRA modules. (b) Importance distribution of the first-layer Query and Value matrices for the Math LoRA module. (c) Importance distribution of the Query matrix across the first and second layers (Math LoRA).
  • Figure 5: Impact of expert quota allocation on model performance.
  • ...and 11 more figures