Table of Contents
Fetching ...

FwdLLM: Efficient FedLLM using Forward Gradient

Mengwei Xu, Dongqi Cai, Yaozong Wu, Xiang Li, Shangguang Wang

TL;DR

FwdLLM tackles the core bottlenecks of federated LLM fine-tuning on mobile devices by replacing backpropagation with forward-gradient-based perturbed-inference, augmented with parameter-efficient fine-tuning. Its adaptive global-perturbation pacing and discriminative perturbation sampling strategically manage computation across devices, enabling orders of magnitude faster convergence and substantial memory and energy savings. The approach demonstrates practical FedLLM for sub-billion and billion-sized LLMs (e.g., LLaMA-7B) on commodity hardware, including successful federated tuning with INT4 quantization and on-device NPUs. This work advances privacy-preserving, scalable on-device learning for next-generation mobile AI and broadens the feasibility of large-scale federated LLM deployment.

Abstract

Large Language Models (LLMs) are transforming the landscape of mobile intelligence. Federated Learning (FL), a method to preserve user data privacy, is often employed in fine-tuning LLMs to downstream mobile tasks, an approach known as FedLLM. Though recent efforts have addressed the network issue induced by the vast model size, they have not practically mitigated vital challenges concerning integration with mobile devices, such as significant memory consumption and sluggish model convergence. In response to these challenges, this work introduces FwdLLM, an innovative FL protocol designed to enhance the FedLLM efficiency. The key idea of FwdLLM to employ backpropagation (BP)-free training methods, requiring devices only to execute ``perturbed inferences''. Consequently, FwdLLM delivers way better memory efficiency and time efficiency (expedited by mobile NPUs and an expanded array of participant devices). FwdLLM centers around three key designs: (1) it combines BP-free training with parameter-efficient training methods, an essential way to scale the approach to the LLM era; (2) it systematically and adaptively allocates computational loads across devices, striking a careful balance between convergence speed and accuracy; (3) it discriminatively samples perturbed predictions that are more valuable to model convergence. Comprehensive experiments with five LLMs and three NLP tasks illustrate FwdLLM's significant advantages over conventional methods, including up to three orders of magnitude faster convergence and a 14.6x reduction in memory footprint. Uniquely, FwdLLM paves the way for federated learning of billion-parameter LLMs such as LLaMA on COTS mobile devices -- a feat previously unattained.

FwdLLM: Efficient FedLLM using Forward Gradient

TL;DR

FwdLLM tackles the core bottlenecks of federated LLM fine-tuning on mobile devices by replacing backpropagation with forward-gradient-based perturbed-inference, augmented with parameter-efficient fine-tuning. Its adaptive global-perturbation pacing and discriminative perturbation sampling strategically manage computation across devices, enabling orders of magnitude faster convergence and substantial memory and energy savings. The approach demonstrates practical FedLLM for sub-billion and billion-sized LLMs (e.g., LLaMA-7B) on commodity hardware, including successful federated tuning with INT4 quantization and on-device NPUs. This work advances privacy-preserving, scalable on-device learning for next-generation mobile AI and broadens the feasibility of large-scale federated LLM deployment.

Abstract

Large Language Models (LLMs) are transforming the landscape of mobile intelligence. Federated Learning (FL), a method to preserve user data privacy, is often employed in fine-tuning LLMs to downstream mobile tasks, an approach known as FedLLM. Though recent efforts have addressed the network issue induced by the vast model size, they have not practically mitigated vital challenges concerning integration with mobile devices, such as significant memory consumption and sluggish model convergence. In response to these challenges, this work introduces FwdLLM, an innovative FL protocol designed to enhance the FedLLM efficiency. The key idea of FwdLLM to employ backpropagation (BP)-free training methods, requiring devices only to execute ``perturbed inferences''. Consequently, FwdLLM delivers way better memory efficiency and time efficiency (expedited by mobile NPUs and an expanded array of participant devices). FwdLLM centers around three key designs: (1) it combines BP-free training with parameter-efficient training methods, an essential way to scale the approach to the LLM era; (2) it systematically and adaptively allocates computational loads across devices, striking a careful balance between convergence speed and accuracy; (3) it discriminatively samples perturbed predictions that are more valuable to model convergence. Comprehensive experiments with five LLMs and three NLP tasks illustrate FwdLLM's significant advantages over conventional methods, including up to three orders of magnitude faster convergence and a 14.6x reduction in memory footprint. Uniquely, FwdLLM paves the way for federated learning of billion-parameter LLMs such as LLaMA on COTS mobile devices -- a feat previously unattained.
Paper Structure (43 sections, 2 equations, 26 figures, 7 tables)

This paper contains 43 sections, 2 equations, 26 figures, 7 tables.

Figures (26)

  • Figure 1: Peak memory footprint of different training methods and inference. Batch size: 8.
  • Figure 2: ALBERT
  • Figure 3: MobileBERT
  • Figure 5: Clients (w/ adapter)
  • Figure 6: Clients (w/o adapter)
  • ...and 21 more figures