Personalized Federated Fine-Tuning for LLMs via Data-Driven Heterogeneous Model Architectures
Yicheng Zhang, Zhen Qin, Zhaomin Wu, Jian Hou, Shuiguang Deng
TL;DR
This work proposes FedAMoLE, a personalized federated fine-tuning framework for LLMs that enables data-driven heterogeneity in model architectures via a heterogeneous MoLE (HMoLE) module and a reverse selection-based expert assignment (RSEA). By injecting lightweight LoRA-based experts into the decoder layers and aligning them to client data through RSEA, FedAMoLE achieves superior performance across seven non-IID scenarios while maintaining scalable communication and compute overhead. Empirical results show average improvements around 5.97% over strong baselines, with notable gains on highly heterogeneous domains, and the approach remains practical with DP privacy options and efficient MILP-based expert assignment. The combination of architectural personalization, data-driven expert selection, and privacy-aware design provides a practical path for federated LLM fine-tuning in real-world, cross-organizational settings.
Abstract
Large language models (LLMs) are increasingly powering web-based applications, whose effectiveness relies on fine-tuning with large-scale instruction data. However, such data often contains valuable or sensitive information that limits its public sharing among business organizations. Federated learning (FL) enables collaborative fine-tuning of LLMs without accessing raw data. Existing approaches to federated LLM fine-tuning usually adopt a uniform model architecture, making it challenging to fit highly heterogeneous client-side data in varying domains and tasks, e.g., hospitals and financial institutions conducting federated fine-tuning may require different LLM architectures due to the distinct nature of their domains and tasks. To address this, we propose FedAMoLE, a lightweight personalized FL framework that enables data-driven heterogeneous model architectures. It features a heterogeneous mixture of low-rank adaptation (LoRA) experts module to aggregate architecturally heterogeneous models and a reverse selection-based expert assignment strategy to tailor model architectures for each client based on data distributions. Experiments across seven scenarios demonstrate that FedAMoLE improves client-side performance by an average of 5.97% over existing approaches while maintaining practical memory, communication, and computation overhead.
