Table of Contents
Fetching ...

RingAda: Pipelining Large Model Fine-Tuning on Edge Devices with Scheduled Layer Unfreezing

Liang Li, Xiaopei Chen, Wen Wu

TL;DR

RingAda tackles the challenge of fine-tuning large transformer models on memory-constrained edge devices by combining adapter-based parameter-efficient fine-tuning with a ring-based pipeline across edge clients. It introduces a top-down adapter unfreezing strategy and a per-batch training routine that propagates through a ring of devices, enabling continuous pipelined updates while early-stopping backpropagation at the lowest unfrozen adapter layer to reduce computation. The approach achieves substantial reductions in fine-tuning time and per-device memory usage, while maintaining competitive accuracy compared to full fine-tuning or naive pipeline methods, and preserves data privacy by keeping local data on-device. This work suggests a practical path for privacy-preserving, scalable on-device personalization of large language models in edge networks.

Abstract

To enable large model (LM) based edge intelligent service provisioning, on-device fine-tuning with locally personalized data allows for continuous and privacy-preserving LM customization. In this paper, we propose RingAda, a collaborative training framework designed for fine-tuning transformer-based LMs on edge devices. Particularly, RingAda performs parameter-efficient adapter fine-tuning across a set of interconnected edge devices, forming a ring topology for per-batch training by sequentially placing frozen transformer blocks and their trainable adapter modules on the devices. RingAda follows a novel pipeline-parallel training mechanism with top-down adapter unfreezing, allowing for early-stopping of backpropagation at the lowest unfrozen adapter layer, thereby accelerating the fine-tuning process. Extensive experimental results demonstrate that RingAda significantly reduces fine-tuning time and memory costs while maintaining competitive model performance compared to its peer designs.

RingAda: Pipelining Large Model Fine-Tuning on Edge Devices with Scheduled Layer Unfreezing

TL;DR

RingAda tackles the challenge of fine-tuning large transformer models on memory-constrained edge devices by combining adapter-based parameter-efficient fine-tuning with a ring-based pipeline across edge clients. It introduces a top-down adapter unfreezing strategy and a per-batch training routine that propagates through a ring of devices, enabling continuous pipelined updates while early-stopping backpropagation at the lowest unfrozen adapter layer to reduce computation. The approach achieves substantial reductions in fine-tuning time and per-device memory usage, while maintaining competitive accuracy compared to full fine-tuning or naive pipeline methods, and preserves data privacy by keeping local data on-device. This work suggests a practical path for privacy-preserving, scalable on-device personalization of large language models in edge networks.

Abstract

To enable large model (LM) based edge intelligent service provisioning, on-device fine-tuning with locally personalized data allows for continuous and privacy-preserving LM customization. In this paper, we propose RingAda, a collaborative training framework designed for fine-tuning transformer-based LMs on edge devices. Particularly, RingAda performs parameter-efficient adapter fine-tuning across a set of interconnected edge devices, forming a ring topology for per-batch training by sequentially placing frozen transformer blocks and their trainable adapter modules on the devices. RingAda follows a novel pipeline-parallel training mechanism with top-down adapter unfreezing, allowing for early-stopping of backpropagation at the lowest unfrozen adapter layer, thereby accelerating the fine-tuning process. Extensive experimental results demonstrate that RingAda significantly reduces fine-tuning time and memory costs while maintaining competitive model performance compared to its peer designs.

Paper Structure

This paper contains 9 sections, 1 equation, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: Serial adapter architecture with layer freezing for transformer-based large models.
  • Figure 2: The training workflow of RingAda (an instance with four edge clients).
  • Figure 3: Training Performance of the three schemes.