Table of Contents
Fetching ...

Personalized Federated Fine-Tuning for LLMs via Data-Driven Heterogeneous Model Architectures

Yicheng Zhang, Zhen Qin, Zhaomin Wu, Jian Hou, Shuiguang Deng

TL;DR

This work proposes FedAMoLE, a personalized federated fine-tuning framework for LLMs that enables data-driven heterogeneity in model architectures via a heterogeneous MoLE (HMoLE) module and a reverse selection-based expert assignment (RSEA). By injecting lightweight LoRA-based experts into the decoder layers and aligning them to client data through RSEA, FedAMoLE achieves superior performance across seven non-IID scenarios while maintaining scalable communication and compute overhead. Empirical results show average improvements around 5.97% over strong baselines, with notable gains on highly heterogeneous domains, and the approach remains practical with DP privacy options and efficient MILP-based expert assignment. The combination of architectural personalization, data-driven expert selection, and privacy-aware design provides a practical path for federated LLM fine-tuning in real-world, cross-organizational settings.

Abstract

Large language models (LLMs) are increasingly powering web-based applications, whose effectiveness relies on fine-tuning with large-scale instruction data. However, such data often contains valuable or sensitive information that limits its public sharing among business organizations. Federated learning (FL) enables collaborative fine-tuning of LLMs without accessing raw data. Existing approaches to federated LLM fine-tuning usually adopt a uniform model architecture, making it challenging to fit highly heterogeneous client-side data in varying domains and tasks, e.g., hospitals and financial institutions conducting federated fine-tuning may require different LLM architectures due to the distinct nature of their domains and tasks. To address this, we propose FedAMoLE, a lightweight personalized FL framework that enables data-driven heterogeneous model architectures. It features a heterogeneous mixture of low-rank adaptation (LoRA) experts module to aggregate architecturally heterogeneous models and a reverse selection-based expert assignment strategy to tailor model architectures for each client based on data distributions. Experiments across seven scenarios demonstrate that FedAMoLE improves client-side performance by an average of 5.97% over existing approaches while maintaining practical memory, communication, and computation overhead.

Personalized Federated Fine-Tuning for LLMs via Data-Driven Heterogeneous Model Architectures

TL;DR

This work proposes FedAMoLE, a personalized federated fine-tuning framework for LLMs that enables data-driven heterogeneity in model architectures via a heterogeneous MoLE (HMoLE) module and a reverse selection-based expert assignment (RSEA). By injecting lightweight LoRA-based experts into the decoder layers and aligning them to client data through RSEA, FedAMoLE achieves superior performance across seven non-IID scenarios while maintaining scalable communication and compute overhead. Empirical results show average improvements around 5.97% over strong baselines, with notable gains on highly heterogeneous domains, and the approach remains practical with DP privacy options and efficient MILP-based expert assignment. The combination of architectural personalization, data-driven expert selection, and privacy-aware design provides a practical path for federated LLM fine-tuning in real-world, cross-organizational settings.

Abstract

Large language models (LLMs) are increasingly powering web-based applications, whose effectiveness relies on fine-tuning with large-scale instruction data. However, such data often contains valuable or sensitive information that limits its public sharing among business organizations. Federated learning (FL) enables collaborative fine-tuning of LLMs without accessing raw data. Existing approaches to federated LLM fine-tuning usually adopt a uniform model architecture, making it challenging to fit highly heterogeneous client-side data in varying domains and tasks, e.g., hospitals and financial institutions conducting federated fine-tuning may require different LLM architectures due to the distinct nature of their domains and tasks. To address this, we propose FedAMoLE, a lightweight personalized FL framework that enables data-driven heterogeneous model architectures. It features a heterogeneous mixture of low-rank adaptation (LoRA) experts module to aggregate architecturally heterogeneous models and a reverse selection-based expert assignment strategy to tailor model architectures for each client based on data distributions. Experiments across seven scenarios demonstrate that FedAMoLE improves client-side performance by an average of 5.97% over existing approaches while maintaining practical memory, communication, and computation overhead.

Paper Structure

This paper contains 30 sections, 17 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Scenario examples of federated LLM fine-tuning among three institutes, where each holds data that differs in domain and tasks. To better adapt to local data, personalized models with heterogeneous architectures tailored for local data distributions are usually preferred (b).
  • Figure 2: Effectiveness of heterogeneous and data-driven architectures. FedIT-FT and FedAMoLE-R use homogeneous and heterogeneous architectures (Hetero Arch), respectively. FedAMoLE further adopts data-driven architecture optimization. (Please refer to §\ref{['sec:ablation']} for more details)
  • Figure 3: Overview of FedAMoLE in a single round. Each client has a transformer-based local model with $L$ decoder layers, where each layer includes a self-attention block (with parameters $\mathbf{Q}$, $\mathbf{K}$, $\mathbf{V}$, and $\mathbf{O}$) and an FFN block. Trainable HMoLE modules (see Figure \ref{['fig:moe_module']}) are injected into $\mathbf{Q}$ and $\mathbf{V}$ for fine-tuning, while $\mathbf{K}$, $\mathbf{O}$, and FFN remain frozen. Step ➅ denotes the RSEA strategy (see Figure \ref{['fig:rsea_strategy']}). Components of the same type (e.g., all shared experts) are shown with an identical color and texture.
  • Figure 4: Routing of HMoLE module $m$ at client $i$.
  • Figure 5: RSEA process. $E$ and $C$ denote the number of domain experts and clients, respectively.
  • ...and 6 more figures

Theorems & Definitions (1)

  • definition 1