Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models
Zihan Fang, Zheng Lin, Zhe Chen, Xianhao Chen, Yue Gao, Yuguang Fang
TL;DR
This paper addresses privacy-preserving fine-tuning of large language models (LLMs) in federated settings with heterogeneous edge resources. It introduces FedPipe, an MILP-driven automated pipeline that selects important trainable weights via SVD on low-rank adapters (LoRA), configures per-edge adapter ranks and batch sizes under local budgets, and uses memory-aware quantization along with partial adapter aggregation to minimize communication overhead and avoid inference latency. The approach yields faster convergence, higher accuracy, and far fewer trainable parameters than baseline methods like Vanilla Fine-Tuning, LoRA, and FedAdapter, as demonstrated on LLaMA2-7B and GPT-2 with Alpaca and E2E datasets. These results suggest FedPipe enables scalable, privacy-preserving LLM fine-tuning on diverse edge devices, reducing training cost and memory requirements while maintaining or improving task performance.
Abstract
Recently, there has been a surge in the development of advanced intelligent generative content (AIGC), especially large language models (LLMs). However, for many downstream tasks, it is necessary to fine-tune LLMs using private data. While federated learning offers a promising privacy-preserving solution to LLM fine-tuning, the substantial size of an LLM, combined with high computational and communication demands, makes it hard to apply to downstream tasks. More importantly, private edge servers often possess varying computing and network resources in real-world scenarios, introducing additional complexities to LLM fine-tuning. To tackle these problems, we design and implement an automated federated pipeline, named FedPipe, to fine-tune LLMs with minimal training cost but without adding any inference latency. FedPipe firstly identifies the weights to be fine-tuned based on their contributions to the LLM training. It then configures a low-rank adapter for each selected weight to train local low-rank adapters on an edge server, and aggregate local adapters of all edge servers to fine-tune the whole LLM. Finally, it appropriately quantizes the parameters of LLM to reduce memory space according to the requirements of edge servers. Extensive experiments demonstrate that FedPipe expedites the model training and achieves higher accuracy than state-of-the-art benchmarks.
