Efficient Zero-Order Federated Finetuning of Language Models for Resource-Constrained Devices

Mohamed Aboelenien Ahmed; Kilian Pfeiffer; Ramin Khalili; Heba Khdr; Jörg Henkel

Efficient Zero-Order Federated Finetuning of Language Models for Resource-Constrained Devices

Mohamed Aboelenien Ahmed, Kilian Pfeiffer, Ramin Khalili, Heba Khdr, Jörg Henkel

TL;DR

This work tackles the challenge of privately finetuning large language models on resource-constrained edge devices by introducing METHOD (FedSPZO), a zero-order FL method that splits the model into two blocks and applies different perturbation counts to each. By leveraging a seed-based perturbation scheme and a server-driven model reconstruction, METHOD achieves inference-like memory footprints while reducing per-step computation up to ~3x compared to prior zeroth-order FL methods, with only modest accuracy trade-offs. Experimental results on RoBERTa-Large across SST2, RTE, WIC, and BOOLQ demonstrate substantial memory and computation savings and competitive accuracy relative to zero-order baselines, and favorable comparisons to first-order baselines in memory and communication efficiency. The approach offers a practical path for privacy-preserving, on-device FL finetuning in resource-constrained environments, with potential future gains from low-precision inference accelerators.

Abstract

Federated fine-tuning offers a promising approach for tuning Large Language Models (LLMs) on edge devices while preserving data privacy. However, fine-tuning these models on edge devices remains challenging due to high memory, communication, and computational demands. Zero-order optimization with task alignment provides a potential solution, enabling fine-tuning with inference-level memory requirements but requires a longer convergence time. In this paper, we propose \ac{METHOD} that divides the network into two blocks, applying a different number of perturbations per block in a computationally effective way, achieving faster convergence. Our evaluation shows a $1.6-3\times$ reduction in computation overhead compared to zero-order state of the art techniques in federated learning.

Efficient Zero-Order Federated Finetuning of Language Models for Resource-Constrained Devices

TL;DR

Abstract

Efficient Zero-Order Federated Finetuning of Language Models for Resource-Constrained Devices

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)