Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models

Zihan Fang; Zheng Lin; Zhe Chen; Xianhao Chen; Yue Gao; Yuguang Fang

Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models

Zihan Fang, Zheng Lin, Zhe Chen, Xianhao Chen, Yue Gao, Yuguang Fang

TL;DR

This paper addresses privacy-preserving fine-tuning of large language models (LLMs) in federated settings with heterogeneous edge resources. It introduces FedPipe, an MILP-driven automated pipeline that selects important trainable weights via SVD on low-rank adapters (LoRA), configures per-edge adapter ranks and batch sizes under local budgets, and uses memory-aware quantization along with partial adapter aggregation to minimize communication overhead and avoid inference latency. The approach yields faster convergence, higher accuracy, and far fewer trainable parameters than baseline methods like Vanilla Fine-Tuning, LoRA, and FedAdapter, as demonstrated on LLaMA2-7B and GPT-2 with Alpaca and E2E datasets. These results suggest FedPipe enables scalable, privacy-preserving LLM fine-tuning on diverse edge devices, reducing training cost and memory requirements while maintaining or improving task performance.

Abstract

Recently, there has been a surge in the development of advanced intelligent generative content (AIGC), especially large language models (LLMs). However, for many downstream tasks, it is necessary to fine-tune LLMs using private data. While federated learning offers a promising privacy-preserving solution to LLM fine-tuning, the substantial size of an LLM, combined with high computational and communication demands, makes it hard to apply to downstream tasks. More importantly, private edge servers often possess varying computing and network resources in real-world scenarios, introducing additional complexities to LLM fine-tuning. To tackle these problems, we design and implement an automated federated pipeline, named FedPipe, to fine-tune LLMs with minimal training cost but without adding any inference latency. FedPipe firstly identifies the weights to be fine-tuned based on their contributions to the LLM training. It then configures a low-rank adapter for each selected weight to train local low-rank adapters on an edge server, and aggregate local adapters of all edge servers to fine-tune the whole LLM. Finally, it appropriately quantizes the parameters of LLM to reduce memory space according to the requirements of edge servers. Extensive experiments demonstrate that FedPipe expedites the model training and achieves higher accuracy than state-of-the-art benchmarks.

Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models

TL;DR

Abstract

Paper Structure (19 sections, 12 equations, 16 figures, 5 tables, 1 algorithm)

This paper contains 19 sections, 12 equations, 16 figures, 5 tables, 1 algorithm.

Introduction
Background and Motivation
Injecting PEFT into Federated Learning on LLMs
Severe Straggler Problem in LLM FL
Weight-Level Configuration
Budget-Aware Model Parameters Alignment
Automated Federated Pipeline Design
Overview of FedPipe
Modelling Heterogeneous LoRA Adapters
Identification of important weights
Heterogeneous LoRA Adapters Configuration
Quantization for Adapters' Training
LoRA Adapters Aggregation and Deployment
Implementation and Experimental Setup
Evaluation
...and 4 more sections

Figures (16)

Figure 1: A scenario of LLM FL via PEFT with edge servers.
Figure 2: An illustration of LoRA method.
Figure 3: The training time, and exchanged parameters of LoRA and full parameter fine-tuning for GPT-2, where $r$ is the rank size of LoRA adapter.
Figure 4: The straggler problem on GPT-2 and BERT.
Figure 5: GPT-2 training performance with different trainable weights, where the trainable weights combinations corresponding to quantities 1, 2, 3, and 4 are $\left\{ {{{\bf{W}}_q}} \right\}$, $\left\{ {{{\bf{W}}_q},{{\bf{W}}_k}} \right\}$, $\left\{ {{{\bf{W}}_q},{{\bf{W}}_k},{{\bf{W}}_v}} \right\}$, and $\left\{ {{{\bf{W}}_q},{{\bf{W}}_k},{{\bf{W}}_v},{{\bf{W}}_o}} \right\}$, respectively.
...and 11 more figures

Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models

TL;DR

Abstract

Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (16)