Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for Collaborative DNN Training on Heterogeneous Edge Devices

Shengyuan Ye; Liekang Zeng; Xiaowen Chu; Guoliang Xing; Xu Chen

Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for Collaborative DNN Training on Heterogeneous Edge Devices

Shengyuan Ye, Liekang Zeng, Xiaowen Chu, Guoliang Xing, Xu Chen

TL;DR

Asteroid addresses privacy-conscious on-device training by orchestrating multiple heterogeneous edge devices through Hybrid Pipeline Parallelism (HPP). It combines a dynamic-programming–driven parallelism planner with memory-aware micro-batch scheduling and a lightweight fault-tolerant pipeline replay to adapt to device dynamics. The system achieves up to 12.2× training speedups over conventional methods and 2.1× over state-of-the-art HPP approaches, while maintaining robust recovery (14× faster) under device exits and failures with comparable throughput. By partitioning models across devices and exploiting selective inter-stage communication, Asteroid reduces memory footprints and communication overhead, enabling scalable, privacy-preserving edge learning in realistic settings.

Abstract

On-device Deep Neural Network (DNN) training has been recognized as crucial for privacy-preserving machine learning at the edge. However, the intensive training workload and limited onboard computing resources pose significant challenges to the availability and efficiency of model training. While existing works address these challenges through native resource management optimization, we instead leverage our observation that edge environments usually comprise a rich set of accompanying trusted edge devices with idle resources beyond a single terminal. We propose Asteroid, a distributed edge training system that breaks the resource walls across heterogeneous edge devices for efficient model training acceleration. Asteroid adopts a hybrid pipeline parallelism to orchestrate distributed training, along with a judicious parallelism planning for maximizing throughput under certain resource constraints. Furthermore, a fault-tolerant yet lightweight pipeline replay mechanism is developed to tame the device-level dynamics for training robustness and performance stability. We implement Asteroid on heterogeneous edge devices with both vision and language models, demonstrating up to 12.2x faster training than conventional parallelism methods and 2.1x faster than state-of-the-art hybrid parallelism methods through evaluations. Furthermore, Asteroid can recover training pipeline 14x faster than baseline methods while preserving comparable throughput despite unexpected device exiting and failure.

Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for Collaborative DNN Training on Heterogeneous Edge Devices

TL;DR

Abstract

Paper Structure (22 sections, 11 equations, 19 figures, 8 tables, 2 algorithms)

This paper contains 22 sections, 11 equations, 19 figures, 8 tables, 2 algorithms.

Introduction
Motivation and Preliminaries
DNN Training on Resource-Constrained Edge Devices
Edge Collaborative Training with Data Parallelism and Pipeline Parallelism
Key Insight: Combining Data Parallelism with Pipeline Parallelism
Technical Challenges
Asteroid System Design
System Overview
Hybrid Pipeline Parallelism in Asteroid
Parallelism Planning
Fault-Tolerant Pipeline Replay
Implementation
Evaluation
Experimental Setup
Comparison with DP and PP
...and 7 more sections

Figures (19)

Figure 1: Left: The training latency breakdown in DP. Right: Bytes communicated per sample in DP and PP. Both experiments are conducted on a three-Jetson Nano edge environment with 100Mbps D2D bandwidth.
Figure 2: Illustration of HDP and HPP.
Figure 3: Asteroid Overview: A three-phase workflow includes Preprocessing, Planning, and Execution Phase.
Figure 4: An instance of HPP with four edge devices.
Figure 5: Breakdown of the memory footprint during DNN training, profiled on a Jetson NX.
...and 14 more figures

Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for Collaborative DNN Training on Heterogeneous Edge Devices

TL;DR

Abstract

Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for Collaborative DNN Training on Heterogeneous Edge Devices

Authors

TL;DR

Abstract

Table of Contents

Figures (19)