Table of Contents
Fetching ...

SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models

Zheng Lin, Xuanjie Hu, Yuxin Zhang, Zhe Chen, Zihan Fang, Xianhao Chen, Ang Li, Praneeth Vepakomma, Yue Gao

TL;DR

SplitLoRA tackles the data scarcity and scale challenge of LLM fine-tuning by introducing a split-learning–based framework built on Split Federated Learning and LoRA. It partitions models into client-side $W_c$ and server-side $W_s$ components, with LoRA adapters $A$ and $B$ enabling low-rank updates, and trains across multiple clients in parallel while aggregating only the adapters. Empirical results on GPT-2 S/M with the E2E dataset show SplitLoRA achieves converged accuracy close to centralized baselines while significantly reducing client-side computation and communication, and converges faster than fully federated LoRA baselines. The work provides the first open-source benchmark for SL LLM fine-tuning and points to practical directions in model splitting, heterogeneity handling, efficiency, and privacy for real-world deployments.

Abstract

The scalability of large language models (LLMs) in handling high-complexity models and large-scale datasets has led to tremendous successes in pivotal domains. While there is an urgent need to acquire more training data for LLMs, a concerning reality is the depletion of high-quality public datasets within a few years. In view of this, the federated learning (FL) LLM fine-tuning paradigm recently has been proposed to facilitate collaborative LLM fine-tuning on distributed private data, where multiple data owners collaboratively fine-tune a shared LLM without sharing raw data. However, the staggering model size of LLMs imposes heavy computing and communication burdens on clients, posing significant barriers to the democratization of the FL LLM fine-tuning paradigm. To address this issue, split learning (SL) has emerged as a promising solution by offloading the primary training workload to a server via model partitioning while exchanging activation/activation's gradients with smaller data sizes rather than the entire LLM. Unfortunately, research on the SL LLM fine-tuning paradigm is still in its nascent stage. To fill this gap, in this paper, we propose the first SL LLM fine-tuning framework, named SplitLoRA. SplitLoRA is built on the split federated learning (SFL) framework, amalgamating the advantages of parallel training from FL and model splitting from SL and thus greatly enhancing the training efficiency. It is worth noting that SplitLoRA is the inaugural open-source benchmark for SL LLM fine-tuning, providing a foundation for research efforts dedicated to advancing SL LLM fine-tuning. Extensive simulations validate that SplitLoRA achieves target accuracy in significantly less time than state-of-the-art LLM fine-tuning frameworks, demonstrating the superior training performance of SplitLoRA. The project page is available at https://fduinc.github.io/splitlora/.

SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models

TL;DR

SplitLoRA tackles the data scarcity and scale challenge of LLM fine-tuning by introducing a split-learning–based framework built on Split Federated Learning and LoRA. It partitions models into client-side and server-side components, with LoRA adapters and enabling low-rank updates, and trains across multiple clients in parallel while aggregating only the adapters. Empirical results on GPT-2 S/M with the E2E dataset show SplitLoRA achieves converged accuracy close to centralized baselines while significantly reducing client-side computation and communication, and converges faster than fully federated LoRA baselines. The work provides the first open-source benchmark for SL LLM fine-tuning and points to practical directions in model splitting, heterogeneity handling, efficiency, and privacy for real-world deployments.

Abstract

The scalability of large language models (LLMs) in handling high-complexity models and large-scale datasets has led to tremendous successes in pivotal domains. While there is an urgent need to acquire more training data for LLMs, a concerning reality is the depletion of high-quality public datasets within a few years. In view of this, the federated learning (FL) LLM fine-tuning paradigm recently has been proposed to facilitate collaborative LLM fine-tuning on distributed private data, where multiple data owners collaboratively fine-tune a shared LLM without sharing raw data. However, the staggering model size of LLMs imposes heavy computing and communication burdens on clients, posing significant barriers to the democratization of the FL LLM fine-tuning paradigm. To address this issue, split learning (SL) has emerged as a promising solution by offloading the primary training workload to a server via model partitioning while exchanging activation/activation's gradients with smaller data sizes rather than the entire LLM. Unfortunately, research on the SL LLM fine-tuning paradigm is still in its nascent stage. To fill this gap, in this paper, we propose the first SL LLM fine-tuning framework, named SplitLoRA. SplitLoRA is built on the split federated learning (SFL) framework, amalgamating the advantages of parallel training from FL and model splitting from SL and thus greatly enhancing the training efficiency. It is worth noting that SplitLoRA is the inaugural open-source benchmark for SL LLM fine-tuning, providing a foundation for research efforts dedicated to advancing SL LLM fine-tuning. Extensive simulations validate that SplitLoRA achieves target accuracy in significantly less time than state-of-the-art LLM fine-tuning frameworks, demonstrating the superior training performance of SplitLoRA. The project page is available at https://fduinc.github.io/splitlora/.
Paper Structure (18 sections, 6 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 18 sections, 6 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of proposed SplitLoRA framework.
  • Figure 2: The converged accuracy for GPT2-S and GPT2-M models, where Perplexity (PPL) is a metric used to measure how well LLMs predict a sample, with lower PPL indicating better predictive performance.
  • Figure 3: The training performance on GPT2-S and GPT2-M for E2E NLG challenge.