Table of Contents
Fetching ...

SplitFrozen: Split Learning with Device-side Model Frozen for Fine-Tuning LLM on Heterogeneous Resource-Constrained Devices

Jian Ma, Xinchen Lyu, Jun Jiang, Qimei Cui, Haipeng Yao, Xiaofeng Tao

TL;DR

The paper tackles the challenge of fine-tuning large language models on private data using heterogeneous, resource-constrained edge devices. It introduces SplitFrozen, a split-learning framework that freezes device-side layers for forward-only computation and centralizes LoRA-based updates on a server, coordinated by pipeline parallelism to handle device heterogeneity. Key contributions include heterogeneity-aware layer partitioning, server-centric aggregation to mitigate non-IID data effects, and a three-stage training workflow that reduces device computation and total training time while maintaining accuracy. Empirical results on GPT-2 small across MRPC, MNLI-matched, SST-2, and on Llama-3.2 with GSM8K demonstrate substantial accuracy gains under Non-IID distributions (up to 69.4% over baselines) and significant reductions in device computation (up to 86.8%) and total training time (up to 50.2%), with scalability to content-generation tasks. This work provides a practical approach to privacy-preserving, resource-efficient on-edge fine-tuning suitable for integration into 6G edge networks and personalized AI applications.

Abstract

Fine-tuning large language models (LLMs) on private, on-device data can empower tailored personalized AI agents. However, fine-tuning LLMs on resource-constrained edge devices faces significant challenges, including excessive computation overhead, device heterogeneity, and data imbalance. This paper proposes SplitFrozen, a split learning framework that enables efficient LLM fine-tuning by strategically freezing device-side model layers while centralizing parameter-efficient fine-tuning on the server. Our framework partitions LLMs into device-side frozen layers and server-side fine-tuning layers, where heterogeneous resource-constrained devices execute only forward propagation. To minimize server-side training costs, we integrate Low-Rank Adaptation (LoRA) into the server-side layers. A pipeline parallelism strategy further optimizes training efficiency by decoupling device-server computations and leveraging decomposed backward propagation. Experiments on GPT-2 with the MRPC, MNLI-matched, and SST-2 datasets demonstrate that SplitFrozen outperforms FedLoRA and SplitLoRA by 69.4\% model accuracy under extremely imbalanced data, while reducing up to 86.8\% device-side computations and 50.2\% total training time. Experiments also validate the scalability of SplitFrozen on content generation task using Llama-3.2 model on GSM8K dataset.

SplitFrozen: Split Learning with Device-side Model Frozen for Fine-Tuning LLM on Heterogeneous Resource-Constrained Devices

TL;DR

The paper tackles the challenge of fine-tuning large language models on private data using heterogeneous, resource-constrained edge devices. It introduces SplitFrozen, a split-learning framework that freezes device-side layers for forward-only computation and centralizes LoRA-based updates on a server, coordinated by pipeline parallelism to handle device heterogeneity. Key contributions include heterogeneity-aware layer partitioning, server-centric aggregation to mitigate non-IID data effects, and a three-stage training workflow that reduces device computation and total training time while maintaining accuracy. Empirical results on GPT-2 small across MRPC, MNLI-matched, SST-2, and on Llama-3.2 with GSM8K demonstrate substantial accuracy gains under Non-IID distributions (up to 69.4% over baselines) and significant reductions in device computation (up to 86.8%) and total training time (up to 50.2%), with scalability to content-generation tasks. This work provides a practical approach to privacy-preserving, resource-efficient on-edge fine-tuning suitable for integration into 6G edge networks and personalized AI applications.

Abstract

Fine-tuning large language models (LLMs) on private, on-device data can empower tailored personalized AI agents. However, fine-tuning LLMs on resource-constrained edge devices faces significant challenges, including excessive computation overhead, device heterogeneity, and data imbalance. This paper proposes SplitFrozen, a split learning framework that enables efficient LLM fine-tuning by strategically freezing device-side model layers while centralizing parameter-efficient fine-tuning on the server. Our framework partitions LLMs into device-side frozen layers and server-side fine-tuning layers, where heterogeneous resource-constrained devices execute only forward propagation. To minimize server-side training costs, we integrate Low-Rank Adaptation (LoRA) into the server-side layers. A pipeline parallelism strategy further optimizes training efficiency by decoupling device-server computations and leveraging decomposed backward propagation. Experiments on GPT-2 with the MRPC, MNLI-matched, and SST-2 datasets demonstrate that SplitFrozen outperforms FedLoRA and SplitLoRA by 69.4\% model accuracy under extremely imbalanced data, while reducing up to 86.8\% device-side computations and 50.2\% total training time. Experiments also validate the scalability of SplitFrozen on content generation task using Llama-3.2 model on GSM8K dataset.

Paper Structure

This paper contains 14 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 2: The pipelining parallelism optimization of SplitFrozen to accelerate wireless fine-tuning.
  • Figure 3: The performance gain of SplitFrozen over SplitLoRA refsplitlora under different matrix ranks and data distributions.
  • Figure : (a) FedLoRA[3]
  • Figure : (a) FedLoRA[3]
  • Figure : (b) SplitLoRA[4]
  • ...and 1 more figures