Table of Contents
Fetching ...

FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-World LoRA

Jieming Bian, Lei Wang, Letian Zhang, Jie Xu

TL;DR

FedALT tackles the problem of cross-client interference in federated fine-tuning of large language models by decoupling local personalization from global knowledge through two LoRA components per client: an updateable Individual LoRA and a frozen Rest-of-World (RoW) LoRA that aggregates knowledge from other clients. An input-specific adaptive Mixture-of-Experts mixer dynamically weights the two components, enabling personalized adaptation while leveraging global information in a controlled manner. Empirical results on Bloom-560M and Llama 2-7B across diverse NLP tasks show FedALT outperforms FedAvg-based and other personalized federated LoRA methods, with robustness to varying numbers of clients, LoRA ranks, and local epochs. The approach reduces harmful interference, maintains computational efficiency, and offers a practical path for privacy-preserving fine-tuning of heterogeneous client data in real-world NLP applications.

Abstract

Fine-tuning large language models (LLMs) in federated settings enables privacy-preserving adaptation but suffers from cross-client interference due to model aggregation. Existing federated LoRA fine-tuning methods, primarily based on FedAvg, struggle with data heterogeneity, leading to harmful cross-client interference and suboptimal personalization. In this work, we propose \textbf{FedALT}, a novel personalized federated LoRA fine-tuning algorithm that fundamentally departs from FedAvg. Instead of using an aggregated model to initialize local training, each client continues training its individual LoRA while incorporating shared knowledge through a separate Rest-of-World (RoW) LoRA component. To effectively balance local adaptation and global information, FedALT introduces an adaptive mixer that dynamically learns input-specific weightings between the individual and RoW LoRA components, drawing conceptual foundations from the Mixture-of-Experts (MoE) paradigm. Through extensive experiments on NLP benchmarks, we demonstrate that FedALT significantly outperforms state-of-the-art personalized federated LoRA fine-tuning methods, achieving superior local adaptation without sacrificing computational efficiency.

FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-World LoRA

TL;DR

FedALT tackles the problem of cross-client interference in federated fine-tuning of large language models by decoupling local personalization from global knowledge through two LoRA components per client: an updateable Individual LoRA and a frozen Rest-of-World (RoW) LoRA that aggregates knowledge from other clients. An input-specific adaptive Mixture-of-Experts mixer dynamically weights the two components, enabling personalized adaptation while leveraging global information in a controlled manner. Empirical results on Bloom-560M and Llama 2-7B across diverse NLP tasks show FedALT outperforms FedAvg-based and other personalized federated LoRA methods, with robustness to varying numbers of clients, LoRA ranks, and local epochs. The approach reduces harmful interference, maintains computational efficiency, and offers a practical path for privacy-preserving fine-tuning of heterogeneous client data in real-world NLP applications.

Abstract

Fine-tuning large language models (LLMs) in federated settings enables privacy-preserving adaptation but suffers from cross-client interference due to model aggregation. Existing federated LoRA fine-tuning methods, primarily based on FedAvg, struggle with data heterogeneity, leading to harmful cross-client interference and suboptimal personalization. In this work, we propose \textbf{FedALT}, a novel personalized federated LoRA fine-tuning algorithm that fundamentally departs from FedAvg. Instead of using an aggregated model to initialize local training, each client continues training its individual LoRA while incorporating shared knowledge through a separate Rest-of-World (RoW) LoRA component. To effectively balance local adaptation and global information, FedALT introduces an adaptive mixer that dynamically learns input-specific weightings between the individual and RoW LoRA components, drawing conceptual foundations from the Mixture-of-Experts (MoE) paradigm. Through extensive experiments on NLP benchmarks, we demonstrate that FedALT significantly outperforms state-of-the-art personalized federated LoRA fine-tuning methods, achieving superior local adaptation without sacrificing computational efficiency.

Paper Structure

This paper contains 31 sections, 1 theorem, 35 equations, 5 figures, 14 tables, 1 algorithm.

Key Result

Theorem 1

With the assumptions, we can derive: If $\beta < 1$ and the local training at each round converges to a neighborhood of the optimal solution, then the sequence of trainable parameters $\{Z_k^t\}$ generated by FedALT converges to a stable point for each client $k$.

Figures (5)

  • Figure 1: Illustration of FedALT. Instead of directly aggregating local LoRA modules from each client using FedAvg, FedALT introduces a frozen RoW LoRA component to transmit shared global knowledge while preserving client-specific adaptations through Individual LoRA. The adaptive mixer dynamically combines the RoW LoRA and Individual LoRA.
  • Figure 2: Motivational study results.
  • Figure 2: Impact of decoupling LoRA training.
  • Figure 3: Ablation studies of FedALT.
  • Figure 4: Sensitivity Analyses of FedALT under different training configurations.

Theorems & Definitions (2)

  • Theorem 1: Convergence of FedALT
  • proof