Table of Contents
Fetching ...

Implementation of Big AI Models for Wireless Networks with Collaborative Edge Computing

Liekang Zeng, Shengyuan Ye, Xu Chen, Yang Yang

TL;DR

The paper presents collaborative edge training as a framework to leverage a pool of trusted edge devices for sustainable, privacy-preserving training of large Transformer-based AI models at the wireless edge. It contrasts this approach with cloud, on-device, and federated methods, arguing that collaboration within trusted domains can improve efficiency and privacy while enabling training of models that exceed single-device capacities. A four-phase framework is proposed: profile participants, design energy-aware parallelism, replicate model partitions across devices, and run in-domain training with inter-device data exchange. A case study analyzes four Transformer parallelisms (data, sequence, tensor, pipeline) on realistic Jetson-based testbeds, showing that data and pipeline parallelism with GPU acceleration achieve favorable energy efficiency and enable training of large models that other parallelisms cannot support due to memory or communication limits. The paper also outlines open challenges across sustainability metrics, orchestration, incentives, wireless-network design, and privacy/security tradeoffs, pointing to future research directions for sustainable edge-centric AI training.

Abstract

Big Artificial Intelligence (AI) models have emerged as a crucial element in various intelligent applications at the edge, such as voice assistants in smart homes and autonomous robotics in smart factories. Training big AI models, e.g., for personalized fine-tuning and continual model refinement, poses significant challenges to edge devices due to the inherent conflict between limited computing resources and intensive workload associated with training. Despite the constraints of on-device training, traditional approaches usually resort to aggregating training data and sending it to a remote cloud for centralized training. Nevertheless, this approach is neither sustainable, which strains long-range backhaul transmission and energy-consuming datacenters, nor safely private, which shares users' raw data with remote infrastructures. To address these challenges, we alternatively observe that prevalent edge environments usually contain a diverse collection of trusted edge devices with untapped idle resources, which can be leveraged for edge training acceleration. Motivated by this, in this article, we propose collaborative edge training, a novel training mechanism that orchestrates a group of trusted edge devices as a resource pool for expedited, sustainable big AI model training at the edge. As an initial step, we present a comprehensive framework for building collaborative edge training systems and analyze in-depth its merits and sustainable scheduling choices following its workflow. To further investigate the impact of its parallelism design, we empirically study a case of four typical parallelisms from the perspective of energy demand with realistic testbeds. Finally, we discuss open challenges for sustainable collaborative edge training to point to future directions of edge-centric big AI model training.

Implementation of Big AI Models for Wireless Networks with Collaborative Edge Computing

TL;DR

The paper presents collaborative edge training as a framework to leverage a pool of trusted edge devices for sustainable, privacy-preserving training of large Transformer-based AI models at the wireless edge. It contrasts this approach with cloud, on-device, and federated methods, arguing that collaboration within trusted domains can improve efficiency and privacy while enabling training of models that exceed single-device capacities. A four-phase framework is proposed: profile participants, design energy-aware parallelism, replicate model partitions across devices, and run in-domain training with inter-device data exchange. A case study analyzes four Transformer parallelisms (data, sequence, tensor, pipeline) on realistic Jetson-based testbeds, showing that data and pipeline parallelism with GPU acceleration achieve favorable energy efficiency and enable training of large models that other parallelisms cannot support due to memory or communication limits. The paper also outlines open challenges across sustainability metrics, orchestration, incentives, wireless-network design, and privacy/security tradeoffs, pointing to future research directions for sustainable edge-centric AI training.

Abstract

Big Artificial Intelligence (AI) models have emerged as a crucial element in various intelligent applications at the edge, such as voice assistants in smart homes and autonomous robotics in smart factories. Training big AI models, e.g., for personalized fine-tuning and continual model refinement, poses significant challenges to edge devices due to the inherent conflict between limited computing resources and intensive workload associated with training. Despite the constraints of on-device training, traditional approaches usually resort to aggregating training data and sending it to a remote cloud for centralized training. Nevertheless, this approach is neither sustainable, which strains long-range backhaul transmission and energy-consuming datacenters, nor safely private, which shares users' raw data with remote infrastructures. To address these challenges, we alternatively observe that prevalent edge environments usually contain a diverse collection of trusted edge devices with untapped idle resources, which can be leveraged for edge training acceleration. Motivated by this, in this article, we propose collaborative edge training, a novel training mechanism that orchestrates a group of trusted edge devices as a resource pool for expedited, sustainable big AI model training at the edge. As an initial step, we present a comprehensive framework for building collaborative edge training systems and analyze in-depth its merits and sustainable scheduling choices following its workflow. To further investigate the impact of its parallelism design, we empirically study a case of four typical parallelisms from the perspective of energy demand with realistic testbeds. Finally, we discuss open challenges for sustainable collaborative edge training to point to future directions of edge-centric big AI model training.
Paper Structure (13 sections, 6 figures)

This paper contains 13 sections, 6 figures.

Figures (6)

  • Figure 1: An example scenario of intelligent voice assistant in a smart home, which is driven by a collaboratively trained big AI model.
  • Figure 2: Existing AI model training mechanisms versus collaborative edge training.
  • Figure 3: Overview of collaborative edge training workflow.
  • Figure 4: Illustration of different parallelisms in collaborative edge training. Different colors represent assignments to different edge devices.
  • Figure 5: Energy demand and measured latency per sample of collaborative edge training under different parallelism in a homogeneous testbed. OOM indicates the out-of-memory error.
  • ...and 1 more figures