Table of Contents
Fetching ...

Deploying Large AI Models on Resource-Limited Devices with Split Federated Learning

Xianke Qiang, Hongda Liu, Xinran Zhang, Zheng Chang, Ying-Chang Liang

TL;DR

This work tackles the challenge of deploying large artificial models on resource-constrained edge devices by introducing Quantized Split Federated Fine-Tuning (SFLAM), which partitions training between devices and a central server, keeps embedding layers on devices, and runs heavier transformer components on the server. It combines quantization, power control, and bandwidth allocation to reduce memory, energy, and latency while preserving performance. The authors provide a convergence analysis under activation quantization and formulate a multi-objective optimization problem, solved via a joint BCD framework with SCA and matching techniques. Simulations on ViT-based image classification with non-IID data demonstrate improved efficiency and scalability over traditional FL and SL approaches, highlighting SFLAM’s practicality for edge deployment of large AI models.

Abstract

Large Artificial Intelligence Models (LAMs) powered by massive datasets, extensive parameter scales, and extensive computational resources, leading to significant transformations across various industries. Yet, their practical deployment on resource-limited mobile edge devices is hindered by critical challenges such as data privacy, constrained resources, and high overhead costs. Addressing this gap, this paper proposes a novel framework, named Quantized Split Federated Fine-Tuning Large AI Model (SFLAM). By partitioning the training load between edge devices and servers using a split learning paradigm, SFLAM can facilitate the operation of large models on devices and significantly lowers the memory requirements on edge devices. Additionally, SFLAM incorporates quantization management, power control, and bandwidth allocation strategies to enhance training efficiency while concurrently reducing energy consumption and communication latency. A theoretical analysis exploring the latency-energy trade-off is presented, and the framework's efficacy is validated via comprehensive simulations. The findings indicate that SFLAM achieves superior performance in terms of learning efficiency and scalability compared to conventional methods, thereby providing a valuable approach for enabling advanced AI services in resource-constrained scenarios.

Deploying Large AI Models on Resource-Limited Devices with Split Federated Learning

TL;DR

This work tackles the challenge of deploying large artificial models on resource-constrained edge devices by introducing Quantized Split Federated Fine-Tuning (SFLAM), which partitions training between devices and a central server, keeps embedding layers on devices, and runs heavier transformer components on the server. It combines quantization, power control, and bandwidth allocation to reduce memory, energy, and latency while preserving performance. The authors provide a convergence analysis under activation quantization and formulate a multi-objective optimization problem, solved via a joint BCD framework with SCA and matching techniques. Simulations on ViT-based image classification with non-IID data demonstrate improved efficiency and scalability over traditional FL and SL approaches, highlighting SFLAM’s practicality for edge deployment of large AI models.

Abstract

Large Artificial Intelligence Models (LAMs) powered by massive datasets, extensive parameter scales, and extensive computational resources, leading to significant transformations across various industries. Yet, their practical deployment on resource-limited mobile edge devices is hindered by critical challenges such as data privacy, constrained resources, and high overhead costs. Addressing this gap, this paper proposes a novel framework, named Quantized Split Federated Fine-Tuning Large AI Model (SFLAM). By partitioning the training load between edge devices and servers using a split learning paradigm, SFLAM can facilitate the operation of large models on devices and significantly lowers the memory requirements on edge devices. Additionally, SFLAM incorporates quantization management, power control, and bandwidth allocation strategies to enhance training efficiency while concurrently reducing energy consumption and communication latency. A theoretical analysis exploring the latency-energy trade-off is presented, and the framework's efficacy is validated via comprehensive simulations. The findings indicate that SFLAM achieves superior performance in terms of learning efficiency and scalability compared to conventional methods, thereby providing a valuable approach for enabling advanced AI services in resource-constrained scenarios.

Paper Structure

This paper contains 31 sections, 5 theorems, 40 equations, 6 figures, 2 tables, 3 algorithms.

Key Result

Lemma 1

(Unbiased Quantization Scheme li2017training10038639) A randomized mapping $\mathcal{Q}: \mathbb{R}^d \rightarrow \mathbb{R}^d$ is an unbiased quantization scheme if there exists $\delta$ such that $\mathbb{E}\left[\mathcal{Q}(\mathcal{A})\right] = \mathcal{A}$, $\mathbb{E}\left[\|\mathcal{Q}(\mathc

Figures (6)

  • Figure 1: The architecture of LAMs.
  • Figure 2: The framework of SFLAM.
  • Figure 3: Testing accuracy with random selection of 10 out of 50 devices under different Dirichlet distributions. $Dir(\alpha)$ and quantization settings
  • Figure 4: Average energy consumption and objective value after solving three subproblems under different $T_{max}$.
  • Figure 5: Time and Energy Consumption
  • ...and 1 more figures

Theorems & Definitions (8)

  • Lemma 1
  • Proposition 1
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Theorem 1
  • proof