Table of Contents
Fetching ...

Efficient Zero-Order Federated Finetuning of Language Models for Resource-Constrained Devices

Mohamed Aboelenien Ahmed, Kilian Pfeiffer, Ramin Khalili, Heba Khdr, Jörg Henkel

TL;DR

This work tackles the challenge of privately finetuning large language models on resource-constrained edge devices by introducing METHOD (FedSPZO), a zero-order FL method that splits the model into two blocks and applies different perturbation counts to each. By leveraging a seed-based perturbation scheme and a server-driven model reconstruction, METHOD achieves inference-like memory footprints while reducing per-step computation up to ~3x compared to prior zeroth-order FL methods, with only modest accuracy trade-offs. Experimental results on RoBERTa-Large across SST2, RTE, WIC, and BOOLQ demonstrate substantial memory and computation savings and competitive accuracy relative to zero-order baselines, and favorable comparisons to first-order baselines in memory and communication efficiency. The approach offers a practical path for privacy-preserving, on-device FL finetuning in resource-constrained environments, with potential future gains from low-precision inference accelerators.

Abstract

Federated fine-tuning offers a promising approach for tuning Large Language Models (LLMs) on edge devices while preserving data privacy. However, fine-tuning these models on edge devices remains challenging due to high memory, communication, and computational demands. Zero-order optimization with task alignment provides a potential solution, enabling fine-tuning with inference-level memory requirements but requires a longer convergence time. In this paper, we propose \ac{METHOD} that divides the network into two blocks, applying a different number of perturbations per block in a computationally effective way, achieving faster convergence. Our evaluation shows a $1.6-3\times$ reduction in computation overhead compared to zero-order state of the art techniques in federated learning.

Efficient Zero-Order Federated Finetuning of Language Models for Resource-Constrained Devices

TL;DR

This work tackles the challenge of privately finetuning large language models on resource-constrained edge devices by introducing METHOD (FedSPZO), a zero-order FL method that splits the model into two blocks and applies different perturbation counts to each. By leveraging a seed-based perturbation scheme and a server-driven model reconstruction, METHOD achieves inference-like memory footprints while reducing per-step computation up to ~3x compared to prior zeroth-order FL methods, with only modest accuracy trade-offs. Experimental results on RoBERTa-Large across SST2, RTE, WIC, and BOOLQ demonstrate substantial memory and computation savings and competitive accuracy relative to zero-order baselines, and favorable comparisons to first-order baselines in memory and communication efficiency. The approach offers a practical path for privacy-preserving, on-device FL finetuning in resource-constrained environments, with potential future gains from low-precision inference accelerators.

Abstract

Federated fine-tuning offers a promising approach for tuning Large Language Models (LLMs) on edge devices while preserving data privacy. However, fine-tuning these models on edge devices remains challenging due to high memory, communication, and computational demands. Zero-order optimization with task alignment provides a potential solution, enabling fine-tuning with inference-level memory requirements but requires a longer convergence time. In this paper, we propose \ac{METHOD} that divides the network into two blocks, applying a different number of perturbations per block in a computationally effective way, achieving faster convergence. Our evaluation shows a reduction in computation overhead compared to zero-order state of the art techniques in federated learning.

Paper Structure

This paper contains 21 sections, 7 equations, 4 figures, 1 table, 2 algorithms.

Figures (4)

  • Figure 1: An overview of the proposed METHOD (FedSPZO) round.
  • Figure 2: Memory footprint for training RoBERTa-Large context length $32$ and $256$ using First-order FedAvg and FedAvg(LoRA) with backpropagation using batch size of $4$, and METHOD using batch size $8$.
  • Figure 3: Computation and upload comparison with METHOD(ours) and Federated finetuning for LoRA FedAvg(LoRA) over SST2 dataset.
  • Figure 4: Normalized computation for METHOD (ours) and other zero-order FL methods. METHOD records up to $3\times$ and $2.5\times$ less computation compared to FedZO and DecomFL respectively.