RingAda: Pipelining Large Model Fine-Tuning on Edge Devices with Scheduled Layer Unfreezing
Liang Li, Xiaopei Chen, Wen Wu
TL;DR
RingAda tackles the challenge of fine-tuning large transformer models on memory-constrained edge devices by combining adapter-based parameter-efficient fine-tuning with a ring-based pipeline across edge clients. It introduces a top-down adapter unfreezing strategy and a per-batch training routine that propagates through a ring of devices, enabling continuous pipelined updates while early-stopping backpropagation at the lowest unfrozen adapter layer to reduce computation. The approach achieves substantial reductions in fine-tuning time and per-device memory usage, while maintaining competitive accuracy compared to full fine-tuning or naive pipeline methods, and preserves data privacy by keeping local data on-device. This work suggests a practical path for privacy-preserving, scalable on-device personalization of large language models in edge networks.
Abstract
To enable large model (LM) based edge intelligent service provisioning, on-device fine-tuning with locally personalized data allows for continuous and privacy-preserving LM customization. In this paper, we propose RingAda, a collaborative training framework designed for fine-tuning transformer-based LMs on edge devices. Particularly, RingAda performs parameter-efficient adapter fine-tuning across a set of interconnected edge devices, forming a ring topology for per-batch training by sequentially placing frozen transformer blocks and their trainable adapter modules on the devices. RingAda follows a novel pipeline-parallel training mechanism with top-down adapter unfreezing, allowing for early-stopping of backpropagation at the lowest unfrozen adapter layer, thereby accelerating the fine-tuning process. Extensive experimental results demonstrate that RingAda significantly reduces fine-tuning time and memory costs while maintaining competitive model performance compared to its peer designs.
