Table of Contents
Fetching ...

Adaptive Parameter-Efficient Federated Fine-Tuning on Heterogeneous Devices

Jun Liu, Yunming Liao, Hongli Xu, Yang Xu, Jianchun Liu, Chen Qian

TL;DR

Federated fine-tuning of large language models on edge devices is hindered by resource constraints and device heterogeneity. The authors present LEGEND, an adaptive LoRA-based FedFT framework that jointly optimizes LoRA depth and rank distribution across devices, guided by capacity estimation and a greedy LCD algorithm, with adaptive per-layer aggregation. They demonstrate on a real 80-device testbed that LEGEND achieves substantial speedups (1.5-2.8×) and reduces communication costs by about 42.3%, while maintaining or improving target accuracies across NLP tasks. This work enables practical, privacy-preserving FedFT on heterogeneous hardware by balancing computation, communication, and convergence.

Abstract

Federated fine-tuning (FedFT) has been proposed to fine-tune the pre-trained language models in a distributed manner. However, there are two critical challenges for efficient FedFT in practical applications, i.e., resource constraints and system heterogeneity. Existing works rely on parameter-efficient fine-tuning methods, e.g., low-rank adaptation (LoRA), but with major limitations. Herein, based on the inherent characteristics of FedFT, we observe that LoRA layers with higher ranks added close to the output help to save resource consumption while achieving comparable fine-tuning performance. Then we propose a novel LoRA-based FedFT framework, termed LEGEND, which faces the difficulty of determining the number of LoRA layers (called, LoRA depth) and the rank of each LoRA layer (called, rank distribution). We analyze the coupled relationship between LoRA depth and rank distribution, and design an efficient LoRA configuration algorithm for heterogeneous devices, thereby promoting fine-tuning efficiency. Extensive experiments are conducted on a physical platform with 80 commercial devices. The results show that LEGEND can achieve a speedup of 1.5-2.8$\times$ and save communication costs by about 42.3% when achieving the target accuracy, compared to the advanced solutions.

Adaptive Parameter-Efficient Federated Fine-Tuning on Heterogeneous Devices

TL;DR

Federated fine-tuning of large language models on edge devices is hindered by resource constraints and device heterogeneity. The authors present LEGEND, an adaptive LoRA-based FedFT framework that jointly optimizes LoRA depth and rank distribution across devices, guided by capacity estimation and a greedy LCD algorithm, with adaptive per-layer aggregation. They demonstrate on a real 80-device testbed that LEGEND achieves substantial speedups (1.5-2.8×) and reduces communication costs by about 42.3%, while maintaining or improving target accuracies across NLP tasks. This work enables practical, privacy-preserving FedFT on heterogeneous hardware by balancing computation, communication, and convergence.

Abstract

Federated fine-tuning (FedFT) has been proposed to fine-tune the pre-trained language models in a distributed manner. However, there are two critical challenges for efficient FedFT in practical applications, i.e., resource constraints and system heterogeneity. Existing works rely on parameter-efficient fine-tuning methods, e.g., low-rank adaptation (LoRA), but with major limitations. Herein, based on the inherent characteristics of FedFT, we observe that LoRA layers with higher ranks added close to the output help to save resource consumption while achieving comparable fine-tuning performance. Then we propose a novel LoRA-based FedFT framework, termed LEGEND, which faces the difficulty of determining the number of LoRA layers (called, LoRA depth) and the rank of each LoRA layer (called, rank distribution). We analyze the coupled relationship between LoRA depth and rank distribution, and design an efficient LoRA configuration algorithm for heterogeneous devices, thereby promoting fine-tuning efficiency. Extensive experiments are conducted on a physical platform with 80 commercial devices. The results show that LEGEND can achieve a speedup of 1.5-2.8 and save communication costs by about 42.3% when achieving the target accuracy, compared to the advanced solutions.
Paper Structure (21 sections, 19 equations, 13 figures, 2 tables, 1 algorithm)

This paper contains 21 sections, 19 equations, 13 figures, 2 tables, 1 algorithm.

Figures (13)

  • Figure 1: Illustration of FedNLP, FedLoRA, and LEGEND. FedNLP (left) fine-tunes all parameters of the LM; FedLoRA (mid) applies the same LoRA configuration to all devices; LEGEND (right) applies different LoRA configurations (e.g., LoRA depth) to devices with heterogeneous capabilities.
  • Figure 2: Fine-tuning RoBERTa at different positions.
  • Figure 3: The impact of LoRA position.
  • Figure 4: The impact of LoRA depth.
  • Figure 5: The impact of LoRA rank distribution.
  • ...and 8 more figures